CakeFest 2024: The Official CakePHP Conference

token_get_all

(PHP 4 >= 4.2.0, PHP 5, PHP 7, PHP 8)

token_get_all将提供的源码按 PHP 标记进行分割

说明

token_get_all(string $code, int $flags = 0): array

token_get_all() 解析提供的 code 源码字符,然后使用 Zend 引擎的语法分析器获取源码中的 PHP 语言的解析器记号。

解析器记号列表见解析器记号(token)列表,或者使用 token_name() 翻译获取这个记号的字符串表示。

参数

code

需要解析的 PHP 源码.

flags

有效 flag:

  • TOKEN_PARSE——在指定上下文中识别关键词。

返回值

记号标识符数组。每个单独的记号标识符要么是单个字符(即 ;.>! 等),要么是有三个元素的数组,其中元素 0 是记号索引,元素 1 是原始记号的字符串内容和元素 2 是行号。

示例

示例 #1 token_get_all() 示例

<?php
$tokens
= token_get_all('<?php echo; ?>');

foreach (
$tokens as $token) {
if (
is_array($token)) {
echo
"Line {$token[2]}: ", token_name($token[0]), " ('{$token[1]}')", PHP_EOL;
}
}
?>

以上示例的输出类似于:

Line 1: T_OPEN_TAG ('<?php ')
Line 1: T_ECHO ('echo')
Line 1: T_WHITESPACE (' ')
Line 1: T_CLOSE_TAG ('?>')

示例 #2 token_get_all() 错误用法示例

<?php
$tokens
= token_get_all('/* comment */');

foreach (
$tokens as $token) {
if (
is_array($token)) {
echo
"Line {$token[2]}: ", token_name($token[0]), " ('{$token[1]}')", PHP_EOL;
}
}
?>

以上示例的输出类似于:

Line 1: T_INLINE_HTML ('/* comment */')
请注意,在前面的示例中,字符串解析为 T_INLINE_HTML 而不是预期的 T_COMMENT。这是因为在提供的代码中没有使用开放标记。这相当于在普通文件中将注释放在 PHP 标记之外。

示例 #3 token_get_all() 在类上使用关键词示例

<?php

$source
= <<<'code'
<?php

class A
{
const PUBLIC = 1;
}
code;

$tokens = token_get_all($source, TOKEN_PARSE);

foreach (
$tokens as $token) {
if (
is_array($token)) {
echo
token_name($token[0]) , PHP_EOL;
}
}
?>

以上示例的输出类似于:

T_OPEN_TAG
T_WHITESPACE
T_CLASS
T_WHITESPACE
T_STRING
T_CONST
T_WHITESPACE
T_STRING
T_LNUMBER
如果没有 TOKEN_PARSE flag,倒数第二个记号(T_STRING)将是 T_PUBLIC

参见

  • PhpToken::tokenize() - Splits given source into PHP tokens, represented by PhpToken objects.
  • token_name() - 获取提供的 PHP 解析器代号的符号名称

add a note

User Contributed Notes 8 notes

up
2
Ivan Ustanin
5 years ago
As a caution: when using TOKEN_PARSE with an invalid php-file, one can get an error like this:
Parse error: syntax error, unexpected '__construct' (T_STRING), expecting function (T_FUNCTION) or const (T_CONST) in on line 15
Notice the missing filename as this function accepts a string, not a filename and thus has no idea of the latter.
However an exception would be more appreciated.
up
2
Theriault
7 years ago
The T_OPEN_TAG token will include the first trailing newline (\r, \n, or \r\n), tab (\t), or space. Any additional space after this token will be in a T_WHITESPACE token.

The T_CLOSE_TAG token will include the first trailing newline (\r, \n, or \r\n; as described here http://php.net/manual/en/language.basic-syntax.instruction-separation.php). Any additional space after this token will be in a T_INLINE_HTML token.
up
4
Dennis Robinson from basnetworks dot net
14 years ago
I wanted to use the tokenizer functions to count source lines of code, including counting comments. Attempting to do this with regular expressions does not work well because of situations where /* appears in a string, or other situations. The token_get_all() function makes this task easy by detecting all the comments properly. However, it does not tokenize newline characters. I wrote the below set of functions to also tokenize newline characters as T_NEW_LINE.

<?php

define
('T_NEW_LINE', -1);

function
token_get_all_nl($source)
{
$new_tokens = array();

// Get the tokens
$tokens = token_get_all($source);

// Split newlines into their own tokens
foreach ($tokens as $token)
{
$token_name = is_array($token) ? $token[0] : null;
$token_data = is_array($token) ? $token[1] : $token;

// Do not split encapsed strings or multiline comments
if ($token_name == T_CONSTANT_ENCAPSED_STRING || substr($token_data, 0, 2) == '/*')
{
$new_tokens[] = array($token_name, $token_data);
continue;
}

// Split the data up by newlines
$split_data = preg_split('#(\r\n|\n)#', $token_data, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

foreach (
$split_data as $data)
{
if (
$data == "\r\n" || $data == "\n")
{
// This is a new line token
$new_tokens[] = array(T_NEW_LINE, $data);
}
else
{
// Add the token under the original token name
$new_tokens[] = is_array($token) ? array($token_name, $data) : $data;
}
}
}

return
$new_tokens;
}

function
token_name_nl($token)
{
if (
$token === T_NEW_LINE)
{
return
'T_NEW_LINE';
}

return
token_name($token);
}

?>

Example usage:

<?php

$tokens
= token_get_all_nl(file_get_contents('somecode.php'));

foreach (
$tokens as $token)
{
if (
is_array($token))
{
echo (
token_name_nl($token[0]) . ': "' . $token[1] . '"<br />');
}
else
{
echo (
'"' . $token . '"<br />');
}
}

?>

I'm sure you can figure out how to count the lines of code, and lines of comments with these functions. This was a huge improvement on my previous attempt at counting lines of code with regular expressions. I hope this helps someone, as many of the user contributed examples on this website have helped me in the past.
up
4
gomodo at free dot fr
14 years ago
Yes, some problems (On WAMP, PHP 5.3.0 ) with get_token_all()

1 : bug line numbers
Since PHP 5.2.2 token_get_all() should return Line numbers in element 2..
.. but for instance (5.3.0 on WAMP), it work perfectly only with PHP code (not HMTL miwed), but if you have some T_INLINE_HTML detected by token_get_all() , sometimes you find wrongs line numbers (return next line)... :(

2: bug warning message can impact loops
Warning with php code uncompleted (ex : php code line by line) :
for example if a comment tag is not closed token_get_all() can block loops on this warning :
Warning: Unterminated comment starting line

This problem seem not occur in CLI mod (php command line), but only in web mod.

Waiting more stability, used token_get_all() only on PHP code (not HMTL miwed) :
First extract entirely PHP code (with open et close php tag),
Second use token_get_all() on the pure PHP code.

3 : Why there not function to extract PHP code (to extract HTML, we have Tidy..)?

Waiting, I used a function :

The code at end this post :
http://www.developpez.net/forums/d786381/php/langage/
fonctions/analyser-fichier-php-token_get_all/

This function not support :
- Old notation : "<? ?>" and "<% %>"
- heredoc syntax
- nowdoc syntax (since PHP 5.3.0)
up
1
bart
6 years ago
Not all tokens are returned as an array. The rule appears to be that if a token is not variable, but instead it is one particular constant string, it is returned as a string instead. You don't get a line number. This is the case for braces( "{", "}"), parentheses ("(", ")"), brackets ("[", "]"), comma (","), semi-colon (";"), and a whole slew of operator signs ("!", "=", "+", "*", "/", ".", "+=", ...).
up
-9
kevin at metalaxe dot com
15 years ago
Rogier, thanks for that fix. This bug still exists in php 5.2.5. I did notice though that it is possible for a notice to pop up from your code. Changing this line:

$temp[] = $tokens[0][2];

To read this:

$temp[] = isset($tokens[0][2])?$tokens[0][2]:'unknown';

fixes this notice.
up
-13
rogier
16 years ago
Complementary note to code below:
Note that only the FIRST 2 (or 3, if needed) array elements will be updated.

Since I only encountered incorrect results on the FIRST occurence of T_OPEN_TAG, I wrote this quick fix.
Any other following T_OPEN_TAG are, on my testing system (Apache 2.0.52, PHP 5.0.3), parsed correctly.

So, This function assumes only a possibly incorrect first T_OPEN_TAG.
Also, this function assumes the very first element (and ONLY the first element) of the token array to be the possibly incorrect token.
This effectively translates to the first character of the tokenized source to be the start of a php script opening tag '<', followed by either 'php' OR '%' (ASP_style)
To Top