(PHP 4 >= 4.4.3, PHP 5 >= 5.1.3)
mb_check_encoding — 检查字符串在指定的编码里是否有效
$var
= NULL
[, string $encoding
= mb_internal_encoding()
]] )检查指定的字节流在指定的编码里是否有效。它能有效避免所谓的“无效编码攻击(Invalid Encoding Attack)”。
var
要检查的字节流。如果省略了这个参数,此函数会检查所有来自最初请求所有的输入。
encoding
期望的编码。
成功时返回 TRUE
, 或者在失败时返回 FALSE
。
richard at phase dot org (2012-05-08 15:40:22)
The issue whereby mb_check_encoding($string,'UTF-8') falsely returns true for invalid UTF8 byte sequences was resolved somewhere between
PHP 5.2.0 and 5.2.6
The following equivalence seems to work in PHP 5.2.0 and 5.1.6
$valid_utf8 = (@iconv('UTF-8','UTF-8',$string) === $string);
(with apologies for the @)
javalc6 at gmail dot com (2009-12-24 04:52:57)
In order to check if a string is encoded correctly in utf-8, I suggest the following function, that implements the RFC3629 better than mb_check_encoding():
<?php
function check_utf8($str) {
$len = strlen($str);
for($i = 0; $i < $len; $i++){
$c = ord($str[$i]);
if ($c > 128) {
if (($c > 247)) return false;
elseif ($c > 239) $bytes = 4;
elseif ($c > 223) $bytes = 3;
elseif ($c > 191) $bytes = 2;
else return false;
if (($i + $bytes) > $len) return false;
while ($bytes > 1) {
$i++;
$b = ord($str[$i]);
if ($b < 128 || $b > 191) return false;
$bytes--;
}
}
}
return true;
} // end of check_utf8
?>
jbricci at ya-right dot com (2009-03-01 18:52:38)
This function does not check for bad byte sequence(s), it only checks if the byte stream is valid. If you want to verify a encoded string is valid, (IE: does not contain any bad byte sequences do the following...
<?php
/* check a strings encoded value */
function checkEncoding ( $string, $string_encoding )
{
$fs = $string_encoding == 'UTF-8' ? 'UTF-32' : $string_encoding;
$ts = $string_encoding == 'UTF-32' ? 'UTF-8' : $string_encoding;
return $string === mb_convert_encoding ( mb_convert_encoding ( $string, $fs, $ts ), $ts, $fs );
}
/* test 1 variables */
$string = "\x00\x81";
$encoding = "Shift_JIS";
/* test 1 mb_check_encoding (test for bad byte stream) */
if ( true === mb_check_encoding ( $string, $encoding ) )
{
echo 'valid (' . $encoding . ') encoded byte stream!<br />';
}
else
{
echo 'invalid (' . $encoding . ') encoded byte stream!<br />';
}
/* test 1 checkEncoding (test for bad byte sequence(s)) */
if ( true === checkEncoding ( $string, $encoding ) )
{
echo 'valid (' . $encoding . ') encoded byte sequence!<br />';
}
else
{
echo 'invalid (' . $encoding . ') encoded byte sequence!<br />';
}
/* test 2 */
/* test 2 variables */
$string = "\x00\xE3";
$encoding = "UTF-8";
/* test 2 mb_check_encoding (test for bad byte stream) */
if ( true === mb_check_encoding ( $string, $encoding ) )
{
echo 'valid (' . $encoding . ') encoded byte stream!<br />';
}
else
{
echo 'invalid (' . $encoding . ') encoded byte stream!<br />';
}
/* test 2 checkEncoding (test for bad byte sequence(s)) */
if ( true === checkEncoding ( $string, $encoding ) )
{
echo 'valid (' . $encoding . ') encoded byte sequence!<br />';
}
else
{
echo 'invalid (' . $encoding . ') encoded byte sequence!<br />';
}
?>