Sam (2013-01-25 19:38:40)
Note that js comments are also removed by strip_tags as stripping the tags from the following sample will return an empty string
<script type="text/javascript">
// Add custom parameters here.
</script>
bzplan at web dot de (2012-10-07 07:57:40)
a HTML code like this:
<?php
$html = '
<div>
<p style="color:blue;">color is blue</p><p>size is <span style="font-size:200%;">huge</span></p>
<p>material is wood</p>
</div>
';
?>
with <?php $str = strip_tags($html); ?>
... the result is:
$str = 'color is bluesize is huge
material is wood';
notice: the words 'blue' and 'size' grow together :(
and line-breaks are still in new string $str
if you need a space between the words (and without line-break)
use my function: <?php $str = rip_tags($html); ?>
... the result is:
$str = 'color is blue size is huge material is wood';
the function:
<?php
// --------------------------------------------------------------
function rip_tags($string) {
// ----- remove HTML TAGs -----
$string = preg_replace ('/<[^>]*>/', ' ', $string);
// ----- remove control characters -----
$string = str_replace("\r", '', $string); // --- replace with empty space
$string = str_replace("\n", ' ', $string); // --- replace with space
$string = str_replace("\t", ' ', $string); // --- replace with space
// ----- remove multiple spaces -----
$string = trim(preg_replace('/ {2,}/', ' ', $string));
return $string;
}
// --------------------------------------------------------------
?>
the KEY is the regex pattern: '/<[^>]*>/'
instead of strip_tags()
... then remove control characters and multiple spaces
:)
Abdul Al-hasany (2011-02-02 23:44:27)
As noted in the documentation strip_tags would strip php and comments tags even if they are add to $allowable_tags.
Here is a little workaround for this issue:
<?php
function stripTags($text, $tags)
{
// replace php and comments tags so they do not get stripped
$text = preg_replace("@<\?@", "#?#", $text);
$text = preg_replace("@<!--@", "#!--#", $text);
// strip tags normally
$text = strip_tags($text, $tags);
// return php and comments tags to their origial form
$text = preg_replace("@#\?#@", "<?", $text);
$text = preg_replace("@#!--#@", "<!--", $text);
return $text;
}
?>
The function would replace the tags to hashes so strip_tags would not identify them as normal tags, and then when strip_tags does its job the tags are modified back to their original form.
tom at cowin dot us (2010-08-27 19:04:16)
With most web based user input of more than a line of text, it seems I get 90% 'paste from Word'. I've developed this fn over time to try to strip all of this cruft out. A few things I do here are application specific, but if it helps you - great, if you can improve on it or have a better way - please - post it...
<?php
function strip_word_html($text, $allowed_tags = '<b><i><sup><sub><em><strong><u><br>')
{
mb_regex_encoding('UTF-8');
//replace MS special characters first
$search = array('/‘/u', '/’/u', '/“/u', '/”/u', '/—/u');
$replace = array('\'', '\'', '"', '"', '-');
$text = preg_replace($search, $replace, $text);
//make sure _all_ html entities are converted to the plain ascii equivalents - it appears
//in some MS headers, some html entities are encoded and some aren't
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
//try to strip out any C style comments first, since these, embedded in html comments, seem to
//prevent strip_tags from removing html comments (MS Word introduced combination)
if(mb_stripos($text, '/*') !== FALSE){
$text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm');
}
//introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be
//'<1' becomes '< 1'(note: somewhat application specific)
$text = preg_replace(array('/<([0-9]+)/'), array('< $1'), $text);
$text = strip_tags($text, $allowed_tags);
//eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one
$text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);
//strip out inline css and simplify style tags
$search = array('#<(strong|b)[^>]*>(.*?)</(strong|b)>#isu', '#<(em|i)[^>]*>(.*?)</(em|i)>#isu', '#<u[^>]*>(.*?)</u>#isu');
$replace = array('<b>$2</b>', '<i>$2</i>', '<u>$1</u>');
$text = preg_replace($search, $replace, $text);
//on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears
//that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains
//some MS Style Definitions - this last bit gets rid of any leftover comments */
$num_matches = preg_match_all("/\<!--/u", $text, $matches);
if($num_matches){
$text = preg_replace('/\<!--(.)*--\>/isu', '', $text);
}
return $text;
}
?>
cyex at hotmail dot com (2010-07-08 03:40:49)
I thought someone else might find this useful... a simple way to strip BBCode:
<?php
$bbcode_str = "Here is some [b]bold text[/b] and some [color=#FF0000]red text[/color]!";
$plain_text = strip_tags(str_replace(array('[',']'), array('<','>'), $bbcode_str));
//Outputs: Here is some bold text, and some red text!
?>
brettz9 AAT yah (2009-04-05 08:10:58)
Works on shortened <?...?> syntax and thus also will remove XML processing instructions.
hongong at webafrica dot org dot za (2009-03-26 12:52:17)
An easy way to clean a string of all CDATA encapsulation.
<?php
function strip_cdata($string)
{
preg_match_all('/<!\[cdata\[(.*?)\]\]>/is', $string, $matches);
return str_replace($matches[0], $matches[1], $string);
}
?>
Example: echo strip_cdata('<![CDATA[Text]]>');
Returns: Text
kai at froghh dot de (2009-03-06 08:45:05)
a function that decides if < is a start of a tag or a lower than / lower than + equal:
<?php
function lt_replace($str){
return preg_replace("/<([^[:alpha:]])/", '<\\1', $str);
}
?>
It's to be used before strip_slashes.
CEO at CarPool2Camp dot org (2009-02-17 11:10:27)
Note the different outputs from different versions of the same tag:
<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new = strip_tags($data, '<br>');
var_dump($new); // OUTPUTS string(21) "<br>EachNew<br />Line"
<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new = strip_tags($data, '<br/>');
var_dump($new); // OUTPUTS string(16) "Each<br/>NewLine"
<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new = strip_tags($data, '<br />');
var_dump($new); // OUTPUTS string(11) "EachNewLine"
?>
mariusz.tarnaski at wp dot pl (2008-11-12 08:05:25)
Hi. I made a function that removes the HTML tags along with their contents:
Function:
<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
$tags = array_unique($tags[1]);
if(is_array($tags) AND count($tags) > 0) {
if($invert == FALSE) {
return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
}
else {
return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
}
}
elseif($invert == FALSE) {
return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
}
return $text;
}
?>
Sample text:
$text = '<b>sample</b> text with <div>tags</div>';
Result for strip_tags($text):
sample text with tags
Result for strip_tags_content($text):
text with
Result for strip_tags_content($text, '<b>'):
<b>sample</b> text with
Result for strip_tags_content($text, '<b>', TRUE);
text with <div>tags</div>
I hope that someone is useful :)
southsentry at yahoo dot com (2008-09-25 09:15:42)
I was looking for a simple way to ban html from review posts, and the like. I have seen a few classes to do it. This line, while it doesn't strip the post, effectively blocks people from posting html in review and other forms.
<?php
if (strlen(strip_tags($review)) < strlen($review)) {
return false;
}
?>
If you want to further get by the tricksters that use & for html links, include this:
<?php
if (strlen(strip_tags($review)) < strlen($review)) {
return false;
} elseif ( strpos($review, "&") !== false) {
return 5;
}
?>
I hope this helps someone out!
Liam Morland (2008-08-23 17:58:56)
Here is a suggestion for getting rid of attributes: After you run your HTML through strip_tags(), use the DOM interface to parse the HTML. Recursively walk through the DOM tree and remove any unwanted attributes. Serialize the DOM back to the HTML string.
Don't make the default permit mistake: Make a list of the attributes you want to ALLOW and remove any others, rather than removing a specific list, which may be missing something important.
Kalle Sommer Nielsen (2008-03-30 14:05:58)
This adds alot of missing javascript events on the strip_tags_attributes() function from below entries.
Props to MSDN for lots of them ;)
<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavaible', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragdrop', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterupdate', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmoveout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload'))
{
if (empty($aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));
return preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
}
?>
jausions at php dot net (2006-09-18 23:57:35)
To sanitize any user input, you should also consider PEAR's HTML_Safe package.
http://pear.php.net/package/HTML_Safe
admin at automapit dot com (2006-08-09 10:01:54)
<?php
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript
'@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags
'@<style[^>]*?>.*?</style>@siU', // Strip style tags properly
'@<![\s\S]*?--[ \t\n\r]*>@' // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return $text;
}
?>
This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.
It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!
cesar at nixar dot org (2006-03-07 11:44:53)
Here is a recursive function for strip_tags like the one showed in the stripslashes manual page.
<?php
function strip_tags_deep($value)
{
return is_array($value) ?
array_map('strip_tags_deep', $value) :
strip_tags($value);
}
// Example
$array = array('<b>Foo</b>', '<i>Bar</i>', array('<b>Foo</b>', '<i>Bar</i>'));
$array = strip_tags_deep($array);
// Output
print_r($array);
?>
salavert at~ akelos (2006-02-13 02:21:13)
<?php
/**
* Works like PHP function strip_tags, but it only removes selected tags.
* Example:
* strip_selected_tags('<b>Person:</b> <strong>Salavert</strong>', 'strong') => <b>Person:</b> Salavert
*/
function strip_selected_tags($text, $tags = array())
{
$args = func_get_args();
$text = array_shift($args);
$tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags;
foreach ($tags as $tag){
if(preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){
$text = str_replace($found[0],$found[1],$text);
}
}
return $text;
}
?>
Hope you find it useful,
Jose Salavert
Anonymous User (2004-08-22 09:24:59)
Be aware that tags constitute visual whitespace, so stripping may leave the resulting text looking misjoined.
For example,
"<strong>This is a bit of text</strong><p />Followed by this bit"
are seperable paragraphs on a visual plane, but if simply stripped of tags will result in
"This is a bit of textFollowed by this bit"
which may not be what you want, e.g. if you are creating an excerpt for an RSS description field.
The workaround is to force whitespace prior to stripping, using something like this:
<?php
$text = getTheText();
$text = preg_replace('/</',' <',$text);
$text = preg_replace('/>/','> ',$text);
$desc = html_entity_decode(strip_tags($text));
$desc = preg_replace('/[\n\r\t]/',' ',$desc);
$desc = preg_replace('/ /',' ',$desc);
?>
chrisj at thecyberpunk dot com (2001-12-18 12:57:25)
strip_tags has doesn't recognize that css within the style tags are not document text. To fix this do something similar to the following:
$htmlstring = preg_replace("'<style[^>]*>.*</style>'siU",'',$htmlstring);