Normalizer::normalize

normalizer_normalize

(PHP 5 >= 5.3.0, PECL intl >= 1.0.0)

Normalizer::normalize -- normalizer_normalize — Normalizes the input provided and returns the normalized string

说明

面向对象风格

public static string Normalizer::normalize ( string $input [, string $form = Normalizer::FORM_C ] )

过程化风格

string normalizer_normalize ( string $input [, string $form = Normalizer::FORM_C ] )

Normalizes the input provided and returns the normalized string

参数

input: The input string to normalize
form: One of the normalization forms.

返回值

The normalized string or NULL if an error occurred.

范例

Example #1 normalizer_normalize() example


<?php
$char_A_ring = "\xC3\x85"; // 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above = "\xCC\x8A";  // 'COMBINING RING ABOVE' (U+030A)
 
$char_1 = normalizer_normalize( $char_A_ring, Normalizer::FORM_C );
$char_2 = normalizer_normalize( 'A' . $char_combining_ring_above, Normalizer::FORM_C );
 
echo urlencode($char_1);
echo ' ';
echo urlencode($char_2);
?>

Example #2 OO example


<?php
$char_A_ring = "\xC3\x85"; // 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above = "\xCC\x8A";  // 'COMBINING RING ABOVE' (U+030A)
 
$char_1 = Normalizer::normalize( $char_A_ring, Normalizer::FORM_C );
$char_2 = Normalizer::normalize( 'A' . $char_combining_ring_above, Normalizer::FORM_C );
 
echo urlencode($char_1);
echo ' ';
echo urlencode($char_2);
?>

以上例程会输出：

%C3%85 %C3%85

参见

normalizer_is_normalized() - Checks if the provided string is already in the specified normalization form.

用户评论:

o_shes01 at uni-muenster dot de (2011-01-23 08:59:55)

This method/function will return boolean false if $input is not a valid utf-8-string, e.g. <?php var_dump(Normalizer::normalize("\xFF")); // prints "bool(false)" ?>

akniep at rayo dot info (2009-07-30 09:03:53)

Especially when matching texts against each-other or against keywords, it is helpful to normalize the texts before. The following function removes all diacritics (marks like accents) from a given UTF8-encoded texts and returns ASCii-text. Be sure to have the PHP-Normalizer-extension (intl and icu) installed. Tipp: You may also want to map the text to lower case before execute matching procedures ... <?php function normalizeUtf8String( $s) { // Normalizer-class missing! if (! class_exists("Normalizer", $autoload = false)) return $original_string; // maps German (umlauts) and other European characters onto two characters before just removing diacritics $s = preg_replace( '@\x{00c4}@u' , "AE", $s ); // umlaut ? => AE $s = preg_replace( '@\x{00d6}@u' , "OE", $s ); // umlaut ? => OE $s = preg_replace( '@\x{00dc}@u' , "UE", $s ); // umlaut ? => UE $s = preg_replace( '@\x{00e4}@u' , "ae", $s ); // umlaut ? => ae $s = preg_replace( '@\x{00f6}@u' , "oe", $s ); // umlaut ? => oe $s = preg_replace( '@\x{00fc}@u' , "ue", $s ); // umlaut ü => ue $s = preg_replace( '@\x{00f1}@u' , "ny", $s ); // ? => ny $s = preg_replace( '@\x{00ff}@u' , "yu", $s ); // ? => yu // maps special characters (characters with diacritics) on their base-character followed by the diacritical mark // exmaple: ? => U?, á => a` $s = Normalizer::normalize( $s, Normalizer::FORM_D ); $s = preg_replace( '@\pM@u' , "", $s ); // removes diacritics $s = preg_replace( '@\x{00df}@u' , "ss", $s ); // maps German ? onto ss $s = preg_replace( '@\x{00c6}@u' , "AE", $s ); // ? => AE $s = preg_replace( '@\x{00e6}@u' , "ae", $s ); // ? => ae $s = preg_replace( '@\x{0132}@u' , "IJ", $s ); // ? => IJ $s = preg_replace( '@\x{0133}@u' , "ij", $s ); // ? => ij $s = preg_replace( '@\x{0152}@u' , "OE", $s ); // ? => OE $s = preg_replace( '@\x{0153}@u' , "oe", $s ); // ? => oe $s = preg_replace( '@\x{00d0}@u' , "D", $s ); // ? => D $s = preg_replace( '@\x{0110}@u' , "D", $s ); // ? => D $s = preg_replace( '@\x{00f0}@u' , "d", $s ); // ? => d $s = preg_replace( '@\x{0111}@u' , "d", $s ); // d => d $s = preg_replace( '@\x{0126}@u' , "H", $s ); // H => H $s = preg_replace( '@\x{0127}@u' , "h", $s ); // h => h $s = preg_replace( '@\x{0131}@u' , "i", $s ); // i => i $s = preg_replace( '@\x{0138}@u' , "k", $s ); // ? => k $s = preg_replace( '@\x{013f}@u' , "L", $s ); // ? => L $s = preg_replace( '@\x{0141}@u' , "L", $s ); // L => L $s = preg_replace( '@\x{0140}@u' , "l", $s ); // ? => l $s = preg_replace( '@\x{0142}@u' , "l", $s ); // l => l $s = preg_replace( '@\x{014a}@u' , "N", $s ); // ? => N $s = preg_replace( '@\x{0149}@u' , "n", $s ); // ? => n $s = preg_replace( '@\x{014b}@u' , "n", $s ); // ? => n $s = preg_replace( '@\x{00d8}@u' , "O", $s ); // ? => O $s = preg_replace( '@\x{00f8}@u' , "o", $s ); // ? => o $s = preg_replace( '@\x{017f}@u' , "s", $s ); // ? => s $s = preg_replace( '@\x{00de}@u' , "T", $s ); // ? => T $s = preg_replace( '@\x{0166}@u' , "T", $s ); // T => T $s = preg_replace( '@\x{00fe}@u' , "t", $s ); // ? => t $s = preg_replace( '@\x{0167}@u' , "t", $s ); // t => t // remove all non-ASCii characters $s = preg_replace( '@[^\0-\x80]@u' , "", $s ); // possible errors in UTF8-regular-expressions if (empty($s)) return $original_string; else return $s; } ?> The above function is mainly based on the following article: http://ahinea.com/en/tech/accented-translate.html