Web Design
Mobile Internet
Brand Design
Innovative
News
Encyclopedias

[Beijing website production] PHP kernel introduction and extension development guide - basic knowledge

Date:2012-10-22 Source: Shangpin China Type: website encyclopedia
Word Size: small   medium   big

PHP kernel introduction and extension development guide - basic knowledge

1、 Basic knowledge

This chapter briefly introduces some internal mechanisms of Zend engine, which are closely related to Extensions and can also help us write more efficient PHP code.

   1.1 Storage of PHP variables

   1.1.1 Zval structure

Zend uses the zval structure to store the values of PHP variables, as shown below:

  1. typedef union _zvalue_value { 
  2.     long lval;                /* long value */  
  3.     double dval;                  /* double value */  
  4.     struct { 
  5.         char *val;  
  6.         int len;  
  7.     } str;  
  8.     HashTable *ht;                /* hash table value */  
  9.     zend_object_value obj;  
  10. } zvalue_value;  
  11.  
  12. struct _zval_struct { 
  13.      /* Variable information */  
  14.     zvalue_value value;       /* value */  
  15.     zend_uint refcount;  
  16.     zend_uchar type;              /* active type */  
  17.     zend_uchar is_ref;  
  18. };  
  19.  
  20. typedef struct _zval_struct zval;  
  21. <span id= "more-597" ></span>Zend determines which member of value to access according to the type value. The available values are as follows:

  IS_NULLN/A

IS_LONG corresponds to value.lval

IS_DOUBLE corresponds to value.dval

IS_STRING corresponds to value.str

IS_ARRAY corresponds to value.htm

IS_OBJECT corresponds to value.obj

IS_BOOL corresponds to value.lval

IS_RESOURCE corresponds to value.lval

According to this table, two interesting points can be found: first, PHP's array is actually a HashTable, which explains why PHP can support associative arrays; Second, a Resource is a long value. It usually stores a pointer, an index of an internal array, or something that only the creator knows. It can be regarded as a handle

1.1.1 Reference count

Reference counting is widely used in garbage collection, memory pool, string and other places. Zend implements typical reference counting. Multiple PHP variables can share the same zval through the reference counting mechanism. The remaining two members of zval, is_ref and refcount, are used to support this sharing.

Obviously, refcount is used for counting. When you increase or decrease references, this value also increases or decreases accordingly. Once it decreases to zero, Zend will reclaim the zval.

What about is_ref?

1.1.2 zval Status

In PHP, there are two kinds of variables - referenced and non referenced, which are stored in Zend by reference counting. For non reference variables, they are required to be incoherent. When modifying a variable, other variables cannot be affected. The Copy On Write mechanism can solve this conflict. When trying to write a variable, Zend finds that the zval pointed to by the variable is shared by multiple variables, copies a zval with refcount of 1 for it, and decreases the refcount of the original zval, This process is called "zval separation". However, for reference type variables, the requirements are opposite to those for non reference type variables. The variables that reference and assign values must be bound. Modifying one variable will modify all bound variables.

It can be seen that it is necessary to point out the current state of zval in order to deal with the two situations respectively. This is the purpose of is_ref, which indicates whether all variables pointing to the zval are assigned by reference - either all references or none. At this time, another variable is modified. Zend will execute Copy On Write only when its zval is_ref is found to be 0, that is, non reference.

1.1.3 Zval state switching

When all assignment operations on a zval are references or non references, an is_ref is enough. However, the world will never be so beautiful. PHP cannot impose such restrictions on users. When we mix reference and non reference assignment, we must make special treatment.

Scenario I: See the following PHP code:

  1. <!-- $a  = 1;    $b  = & $a ;    $c  = & $b ;    $d  =  $c ;    //Insert a non reference -->in a heap of reference assignments  

The whole process is as follows:

The first three sentences of this code will point a, b, and c to a zval, is_ref=1, refcount=3; The fourth sentence is a non reference assignment. Generally, you only need to increase the reference count. However, the target zval is a reference variable. It is obviously wrong to simply increase the reference count. Zend's solution is to generate a separate copy of zval for d.

   1.1.1 Parameter transmission

The transfer of PHP function parameters is the same as variable assignment. Non reference transfer is equivalent to non reference assignment, and reference transfer is equivalent to reference assignment. It may also lead to zval status switching. This will be mentioned later.

   1.2 HashTable Structure

HashTable is the most important and widely used data structure in the Zend engine. It is used to store almost everything.

   1.1.1 Data structure

HashTable data structure is defined as follows:

  1. typedef struct bucket { 
  2.     ulong h;                  //Storage hash  
  3.     uint nKeyLength;  
  4.     void *pData;              //Point to value, which is a copy of user data  
  5.     void *pDataPtr;  
  6.     struct bucket *pListNext;     //PListNext and pListLast  
  7.     struct bucket *pListLast;     //Double linked list of the entire HashTable  
  8.     struct bucket *pNext;         //PNext and pLast are used to form a hash correspondence  
  9.     struct bucket *pLast;         //Double linked list of  
  10.     char arKey[1];                // key  
  11. } Bucket;  
  12.  
  13. typedef struct _hashtable { 
  14.     uint nTableSize;  
  15.     uint nTableMask;  
  16.     uint nNumOfElements;  
  17.     ulong nNextFreeElement;  
  18.     Bucket *pInternalPointer;     /* Used for element traversal */  
  19.     Bucket *pListHead;  
  20.     Bucket *pListTail;  
  21.     Bucket **arBuckets;           //Hash array  
  22.     dtor_func_t pDestructor;      //Specified when HashTable initializes, called when bucket is destroyed  
  23.     zend_bool persistent;         //Whether to use C's memory allocation routine  
  24.     unsigned char nApplyCount;  
  25.     zend_bool bApplyProtection;  
  26. # if  ZEND_DEBUG 
  27.     int inconsistent;  
  28. # endif  
  29. } HashTable;  

In general, Zend's HashTable is a linked list hash, which is also optimized for linear traversal.

HashTable contains two data structures, a linked list hash and a two-way linked list. The former is used for fast key value queries, and the latter is convenient for linear traversal and sorting. A bucket exists in both data structures.

Several explanations about the data structure:

L Why is a double linked list used in a linked list hash?

In general, linked list hashing only needs to operate by key, and only needs a single linked list. However, Zend sometimes needs to delete a given bucket from the linked list hash. Using a double linked list can be very efficient.

L What does a TableMask do?

This value is used to convert hash values to arBuckets array subscripts. When initializing a HashTable, Zend first allocates nTableSize memory for the arBuckets array. nTableSize is not less than the smallest 2 ^ n of the size specified by the user, that is, 10 * of binary. NTableMask=nTableSize – 1, that is, binary 01 *. At this time, h&nTableMask just falls in [0, nTableSize – 1], and Zend uses it as the index to access the arBuckets array.

L What does pDataPtr do?

Normally, when a user inserts a key value pair, Zend will copy the value and point pData to the value copy. The copy operation needs to call the Zend internal routine emalloc to allocate memory. This is a very time-consuming operation, and will consume a larger memory than value (the extra memory is used to store cookies). If the value is very small, it will cause a large waste. Considering that HashTable is mostly used to store pointer values, Zend introduces pDataPtr. When the value is small enough to be as long as the pointer, Zend directly copies it into pDataPtr and points pData to pDataPtr. This avoids the emalloc operation, and is also conducive to improving the cache hit rate.

Why is the size of arKey only 1? Why not use pointers to manage keys?

ArKey is an array for storing keys, but its size is only 1, which is not enough to hold keys. The following code can be found in the initialization function of HashTable:

  1p = (Bucket *) pemalloc(sizeof(Bucket) - 1 + nKeyLength, ht->persistent);

It can be seen that Zend allocates a block of memory for a bucket that is enough to hold itself and the key,

L The upper part is the bucket, the lower part is the key, and the arKey "happens" to be the last element of the bucket, so you can use the arKey to access the key. This technique is most common in memory management routines. When allocating memory, it is actually allocating memory larger than the specified size. The upper part of the extra memory is usually called a cookie. It stores information about this memory, such as block size, previous pointer, next pointer, etc. This method is used by Baidu's Transmit program.

The purpose of not using pointers to manage keys is to reduce one emalloc operation and improve the cache hit rate. Another necessary reason is that the key is fixed in most cases and will not be reallocated to the entire bucket because the key becomes longer. This also explains why value is not allocated as an array as well - because value is variable.

1.2.2 PHP array

There is still one question about HashTable that has not been answered, that is, what does nNextFreeElement do?

Unlike general hashing, Zend's HashTable allows users to specify the hash value directly, ignoring the key, or even not specifying the key (at this time, nKeyLength is 0). At the same time, HashTable also supports append operation. The user does not need to specify the hash value, but only needs to provide value. At this time, Zend uses nNextFreeElement as the hash, and then increments the nNextFreeElement.

This behavior of HashTable looks strange, because it will not be able to access value by key. It is not a hash at all. The key to understanding the problem is that PHP arrays are implemented using HashTable - associative arrays use normal k-v mapping to add elements to HashTable, and the key is the string specified by the user; For non associative arrays, the array subscript is directly used as the hash value, and there is no key; When you mix associative and non associative in an array, or use the array_push operation, you need to use nNextFreeElement.

Then look at value. The value of the PHP array directly uses the general structure zval. pData points to zval *. According to the introduction in the previous section, this zval * will be directly stored in pDataPtr. Because zval is used directly, the elements of the array can be of any PHP type.

Array traversal operations, namely foreach, each, etc., are performed through HashTable's doubly linked list. pInternalPointer records the current position as a cursor.

1.2.3 Variable symbol table

In addition to arrays, HashTable is also used to store many other data, such as PHP functions, variable symbols, loaded modules, class members, etc.

A variable symbol table is equivalent to an associative array. Its key is the variable name (it can be seen that using a long variable name is not a good idea), and value is zval *.

At any time, PHP code can see two variable symbol tables -- symbol_table and active_symbol_table -- the former is used to store global variables, called the global symbol table; The latter is a pointer to the currently active variable symbol table, which is usually the global symbol table. However, every time a PHP function is entered (this refers to the function created by the user using PHP code), Zend will create a local variable symbol table of the function and point active_symbol_table to the local symbol table. Zend always uses active_symbol_table to access variables, thus realizing the scope control of local variables.

However, if a variable marked as global is accessed locally in the function, Zend will perform special processing -- create a reference to the variable with the same name in symbol_table in active_symbol_table, and create it first if there is no variable with the same name in symbol_table.

1.3 Memory and files

The resources owned by programs generally include memory and files. For common programs, these resources are process oriented. When the process ends, the operating system or C library will automatically recycle those resources that we have not explicitly released.

However, PHP programs have their own particularity. They are page based. When a page runs, it will also apply for resources such as memory or files. However, when the page runs, the operating system or C library may not know that resource recovery is required. For example, we compile php as a module into apache, and run apache in prefork or worker mode. In this case, the apache process or thread is reused, and the memory allocated by the php page will remain in memory until it leaves the core.

To solve this problem, Zend provides a set of memory allocation APIs. They are the same as the corresponding functions in C. The difference is that these functions allocate memory from Zend's own memory pool, and they can realize automatic page based recycling. In our module, the memory allocated for the page should use these APIs instead of C routines, otherwise Zend will try to efree our memory at the end of the page, and the result is usually crash.

  emalloc()

  efree()

  estrdup()

  estrndup()

  ecalloc()

  erealloc()

In addition, Zend also provides a set of macros in the form of VCWD_xxx to replace the corresponding file API of the C library and operating system. These macros can support the virtual working directory of PHP and should always be used in module code. See the PHP source code "TSRM/tsrm_virtual_cwd. h" for the specific definition of macros. You may notice that all those macros do not provide the close operation, because the close object is an open resource and does not involve the file path, so you can directly use C or operating system routines; Similarly, operations such as read/write are also routines that directly use C or the operating system.

label: Beijing website production High end website construction



Please contact our consultant

+86 10-60259772

Please provide your contact number. The project manager of shangpin China will contact you as soon as possible.