PHP 解析 URL，返回其组成部分

版本	说明
5.4.7	修复了 host 在协议省略时的识别。
5.3.3	在 URL 解析失败时将不会产生 `E_WARNING` 级别的错误。
5.1.2	增加了参数 `component`。

用户评论:

utilmind (2013-07-01 00:06:04)

parse_url doesn't works if the protocol doesn't specified. This seems like sandard, even the youtube doesn't gives the protocol name when generates code for embedding which have a look like "//youtube.com/etc". So, to avoid bug, you must always check, whether the provided url has the protocol, and if not (starts with 2 slashes) -- add the "http:" prefix.

ashley at ashleywilson dot com dot au (2012-07-13 05:07:52)

You could write a wrapper for parse_url to convert the query string into an array, using parse_str() as follows: <?php /** * Split a given URL into its components. * Uses parse_url() followed by parse_str() on the query string. * * @param string $url The string to decode. * @return array Associative array containing the different components. */ function parse_url_detail($url){ $parts = parse_url($url); if(isset($parts['query'])) { parse_str(urldecode($parts['query']), $parts['query']); } return $parts; } ?>

Anonymous (2012-06-05 14:46:51)

for a query string in the form page.php?fail you can check array_keys($_GET)

dev at qwerty dot ms (2012-05-26 20:10:45)

All of those who are writing parse_query functions to parse URL queries should note two things (as seen above, everyone trying, got these missing). 1. Fields can be separated by semicolons also, not just the &, thus: field1=val1;field2=val2 is a valid query, with two fields. So instead exploding on & you should split like: <?php $queryFields = split('[;&]', $data['query']); ?> 2. Fields don't necessarily get values. As seen on many places, there are URLs like page.php?fail So you should test if you get a second return value after exploding on equal sign

laszlo dot janszky at gmail dot com (2012-05-24 20:20:13)

Created another parse_url utf-8 compatible function. <?php function mb_parse_url($url) { $encodedUrl = preg_replace('%[^:/?#&=\.]+%usDe', 'urlencode(\'$0\')', $url); $components = parse_url($encodedUrl); foreach ($components as &$component) $component = urldecode($component); return $components; } ?>

therselman at gmail (2012-01-28 01:03:41)

UTF-8 aware parse_url() replacement. I've realized that even though UTF-8 characters are not allowed in URL's, I have to work with a lot of them and parse_url() will break. Based largely on the work of "mallluhuct at gmail dot com", I added parse_url() compatible "named values" which makes the array values a lot easier to work with (instead of just numbers). I also implemented detection of port, username/password and a back-reference to better detect URL's like this: //en.wikipedia.com ... which, although is technically an invalid URL, it's used extensively on sites like wikipedia in the href of anchor tags where it's valid in browsers (one of the types of URL's you have to support when crawling pages). This will be accurately detected as the host name instead of "path" as in all other examples. I will submit my complete function (instead of just the RegExp) which is an almost "drop-in" replacement for parse_url(). It returns a cleaned up array (or false) with values compatible with parse_url(). I could have told the preg_match() not to store the unused extra values, but it would complicate the RegExp and make it more difficult to read, understand and extend. The key to detecting UTF-8 characters is the use of the "u" parameter in preg_match(). <?php function parse_utf8_url($url) { static $keys = array('scheme'=>0,'user'=>0,'pass'=>0,'host'=>0,'port'=>0,'path'=>0,'query'=>0,'fragment'=>0); if (is_string($url) && preg_match( '~^((?P<scheme>[^:/?#]+):(//))?((\\3|//)?(?:(?P<user>[^:]+):(?P<pass>[^@]+)@)?(?P<host>[^/?:#]*))(:(?P<port>\\d+))?' . '(?P<path>[^?#]*)(\\?(?P<query>[^#]*))?(#(?P<fragment>.*))?~u', $url, $matches)) { foreach ($matches as $key => $value) if (!isset($keys[$key]) || empty($value)) unset($matches[$key]); return $matches; } return false; } ?> UTF-8 URL's can/should be "normalized" after extraction with this function.

thomas at gielfeldt dot com (2011-12-02 02:50:47)

[If you haven't yet] been able to find a simple conversion back to string from a parsed url, here's an example: <?php $url = 'http://usr:pss@example.com:81/mypath/myfile.html?a=b&b[]=2&b[]=3#myfragment'; if ($url === unparse_url(parse_url($url))) { print "YES, they match!\n"; } function unparse_url($parsed_url) { $scheme = isset($parsed_url['scheme']) ? $parsed_url['scheme'] . '://' : ''; $host = isset($parsed_url['host']) ? $parsed_url['host'] : ''; $port = isset($parsed_url['port']) ? ':' . $parsed_url['port'] : ''; $user = isset($parsed_url['user']) ? $parsed_url['user'] : ''; $pass = isset($parsed_url['pass']) ? ':' . $parsed_url['pass'] : ''; $pass = ($user || $pass) ? "$pass@" : ''; $path = isset($parsed_url['path']) ? $parsed_url['path'] : ''; $query = isset($parsed_url['query']) ? '?' . $parsed_url['query'] : ''; $fragment = isset($parsed_url['fragment']) ? '#' . $parsed_url['fragment'] : ''; return "$scheme$user$pass$host$port$path$query$fragment"; } ?>

mallluhuct at gmail dot com (2011-07-18 19:38:56)

From rfc3986: As the "first-match-wins" algorithm is identical to the "greedy" disambiguation method used by POSIX regular expressions, it is natural and commonplace to use a regular expression for parsing the potential five components of a URI reference. The following line is the regular expression for breaking-down a well-formed URI reference into its components. ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9 The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression <n> as $<n>. For example, matching the above expression to http://www.ics.uci.edu/pub/ietf/uri/#Related results in the following subexpression matches: $1 = http: $2 = http $3 = //www.ics.uci.edu $4 = www.ics.uci.edu $5 = /pub/ietf/uri/ $6 = <undefined> $7 = <undefined> $8 = #Related $9 = Related where <undefined> indicates that the component is not present, as is the case for the query component in the above example. Therefore, we can determine the value of the five components as scheme = $2 authority = $4 path = $5 query = $7 fragment = $9 Going in the opposite direction, we can recreate a URI reference from its components by using the algorithm of Section 5.3.

gustavo dot andriuolo at vulcabras dot com dot ar (2011-07-12 18:31:30)

Here's a method to get the REAL name of a domain. This return just the domain name, not the rest. First check if is not an IP, then return the name: <?php function esip($ip_addr) { //first of all the format of the ip address is matched if(preg_match("/^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/",$ip_addr)) { //now all the intger values are separated $parts=explode(".",$ip_addr); //now we need to check each part can range from 0-255 foreach($parts as $ip_parts) { if(intval($ip_parts)>255 || intval($ip_parts)<0) return FALSE; //if number is not within range of 0-255 } return TRUE; } else return FALSE; //if format of ip address doesn't matches } function domain($domainb) { $bits = explode('/', $domainb); if ($bits[0]=='http:' || $bits[0]=='https:') { $domainb= $bits[2]; } else { $domainb= $bits[0]; } unset($bits); $bits = explode('.', $domainb); $idz=count($bits); $idz-=3; if (strlen($bits[($idz+2)])==2) { $url=$bits[$idz].'.'.$bits[($idz+1)].'.'.$bits[($idz+2)]; } else if (strlen($bits[($idz+2)])==0) { $url=$bits[($idz)].'.'.$bits[($idz+1)]; } else { $url=$bits[($idz+1)].'.'.$bits[($idz+2)]; } return $url; } $address='clients1.sub3.google.co.uk'; $parsed_url = parse_url($address); $check = esip($parsed_url['host']); $host = $parsed_url['host']; if ($check == FALSE){ if ($host != ""){ $host = domain($host); }else{ $host = domain($address); } } echo $host; ?> This return: google.co.uk 'http://sub1.sub2.sub3.example.com:443'; return: example.com 'example.com'; return: example.com Many times parse_url return nothing when domain is google.com only for example. Now, google.com, or google.co.uk will return the same. Maybe is a little dirty, but works well for me, i use it to group internet access log from squid. Regards.

David Beck (2011-07-03 22:54:04)

Lengthy function to correct (mainly user-submitted) URLs. <?php function correctURL($address) { if (!empty($address) AND $address{0} != '#' AND strpos(strtolower($address), 'mailto:') === FALSE AND strpos(strtolower($address), 'javascript:') === FALSE) { $address = explode('/', $address); $keys = array_keys($address, '..'); foreach($keys AS $keypos => $key) array_splice($address, $key - ($keypos * 2 + 1), 2); $address = implode('/', $address); $address = str_replace('./', '', $address); $scheme = parse_url($address); if (empty($scheme['scheme'])) $address = 'http://' . $address; $parts = parse_url($address); $address = strtolower($parts['scheme']) . '://'; if (!empty($parts['user'])) { $address .= $parts['user']; if (!empty($parts['pass'])) $address .= ':' . $parts['pass']; $address .= '@'; } if (!empty($parts['host'])) { $host = str_replace(',', '.', strtolower($parts['host'])); if (strpos(ltrim($host, 'www.'), '.') === FALSE) $host .= '.com'; $address .= $host; } if (!empty($parts['port'])) $address .= ':' . $parts['port']; $address .= '/'; if (!empty($parts['path'])) { $path = trim($parts['path'], ' /\\'); if (!empty($path) AND strpos($path, '.') === FALSE) $path .= '/'; $address .= $path; } if (!empty($parts['query'])) $address .= '?' . $parts['query']; return $address; } else return FALSE; } ?>

Simon D (2011-06-21 06:45:34)

To get the params (url query) as Associative array, use this function: <?php /** * Returns the url query as associative array * * @param string query * @return array params */ function convertUrlQuery($query) { $queryParts = explode('&', $query); $params = array(); foreach ($queryParts as $param) { $item = explode('=', $param); $params[$item[0]] = $item[1]; } return $params; } ?>

Egor Chernodarov (2011-04-06 01:34:52)

Noticed the following differences in error handling: <?php print_r(parse_url('ftp://user:password@host:port')); ?> In PHP 5.2.6 returns: Array ( [scheme] => ftp [host] => host [user] => user [pass] => password ) port is just skipped. But in PHP 5.3.6 returns NULL without any warnings.

bahtiar at gadimov dot de (2010-12-16 05:38:47)

Hi, if you have problems with UTF8 encoded urls please see http://bugs.php.net/bug.php?id=52923 . parse_url breaks the utf8. :( You have to implement it yourself.

Mark Dobrinic (2010-12-10 02:55:44)

It seems the host-part strips off the last [:port] off the end of the hostname When something is wrong in the actual request, this proves to be the wrong way to do things. It would be better to not strip off the [:port], but to keep the string *before* the first [:port] as the hostname. Problem with (maybe malformed) provided HTTP_HOST hostname:443:443 that resolved in 'host' => 'hostname:443' Which gave me problems. Solution would be to enforce this yourself, explicitly: <?php $p = parse_url( $url ); $host = explode(':', $p['host']); $hostname = $host[0]; ?>

jesse at example dot com (2010-09-25 19:48:48)

@ solenoid: Your code was very helpful, but it fails when the current URL has no query string (it appends '&' instead of '?' before the query). Below is a fixed version that catches this edge case and corrects it. <?php function modify_url($mod) { $url = "http://".$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI']; $query = explode("&", $_SERVER['QUERY_STRING']); if (!$_SERVER['QUERY_STRING']) {$queryStart = "?";} else {$queryStart = "&";} // modify/delete data foreach($query as $q) { list($key, $value) = explode("=", $q); if(array_key_exists($key, $mod)) { if($mod[$key]) { $url = preg_replace('/'.$key.'='.$value.'/', $key.'='.$mod[$key], $url); } else { $url = preg_replace('/&?'.$key.'='.$value.'/', '', $url); } } } // add new data foreach($mod as $key => $value) { if($value && !preg_match('/'.$key.'=/', $url)) { $url .= $queryStart.$key.'='.$value; } } return $url; } ?>

martin at planio dot com (2010-09-17 07:02:51)

For those of you sending URLs in HTML emails with a redirect address in the query string, note that Hotmail unescapes / and : characters in the query string. So that breaks the parse_url() function call. Take this as an example: href="http://example.com/redirect?url=http%3A%2F%2Fplanio.com" Hotmail will transform it to this: href="http://example.com/redirect?url=http://planio.com" The solution is to be preventive before the call to parse_url(): <?php $q_index = strpos($uri, '?'); if ($q_index !== FALSE && (strpos($uri, ':', $q_index) !== FALSE || strpos($uri, '/', $q_index) !== FALSE)) { $begin = substr($uri, 0, $q_index); $end = substr($uri, $q_index, strlen($uri)-$q_index); $end = str_replace('/', '%2F', $end); $end = str_replace(':', '%3A', $end); $uri = $begin.$end; } ?>

przemek at sobstel dot org (2010-08-19 13:44:07)

If you want to get host, function will return NULL if you pass only host. Example: <?php parse_url($url, PHP_URL_HOST); ?> $url => value returned http://example.com => string 'example.com' (length=11) http://www.example.com =>string 'www.example.com' (length=15) http://www.example.com:8080 => string 'www.example.com' (length=15) example.com => null www.example.com => null example.com:8080 => string 'example.com' (length=11) www.example.com:8080 => string 'www.example.com' (length=15)

solenoid at example dot com (2010-04-22 16:05:14)

Here's a piece of code that modifies, replaces or removes the url query. This can typically used in paging situations where there are more parameters than the page. <?php function modify_url($mod) { $url = "http://".$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI']; $query = explode("&", $_SERVER['QUERY_STRING']); // modify/delete data foreach($query as $q) { list($key, $value) = explode("=", $q); if(array_key_exists($key, $mod)) { if($mod[$key]) { $url = preg_replace('/'.$key.'='.$value.'/', $key.'='.$mod[$key], $url); } else { $url = preg_replace('/&?'.$key.'='.$value.'/', '', $url); } } } // add new data foreach($mod as $key => $value) { if($value && !preg_match('/'.$key.'=/', $url)) { $url .= '&'.$key.'='.$value; } } return $url; } // page url: "http://www.example.com/page.php?p=5&show=list&style=23" $url = modify_url(array('p' => 4, 'show' => 'column')); // $url = "http://www.example.com/page.php?p=4&show=column&style=23" ?>

james at roundeights dot com (2010-02-26 11:24:33)

I was writing unit tests and needed to cause this function to kick out an error and return FALSE in order to test a specific execution path. If anyone else needs to force a failure, the following inputs will work: <?php parse_url("http:///example.com"); parse_url("http://:80"); parse_url("http://user@:80"); ?>

need_sunny at yahoo dot com (2009-12-25 05:57:02)

Thanks to xellisx for his parse_query function. I used it in one of my projects and it works well. But it has an error. I fixed the error and improved it a little bit. Here is my version of it: <?php // Originally written by xellisx function parse_query($var) { /** * Use this function to parse out the query array element from * the output of parse_url(). */ $var = parse_url($var, PHP_URL_QUERY); $var = html_entity_decode($var); $var = explode('&', $var); $arr = array(); foreach($var as $val) { $x = explode('=', $val); $arr[$x[0]] = $x[1]; } unset($val, $x, $var); return $arr; } ?> At the first line there was parse_query($val), I made it $var. It used to return a null array before this fix. I have added the parse_url line. So now the function will only focus in the query part, not the whole URL. This is useful if something like below is done: <?php $my_GET = parse_query($_SERVER['REQUEST_URI']); ?>

nirazuelos at gmail dot com (2009-10-09 14:45:45)

Hello, for some odd reason, parse_url returns the host (ex. example.com) as the path when no scheme is provided in the input url. So I've written a quick function to get the real host: <?php function getHost($Address) { $parseUrl = parse_url(trim($Address)); return trim($parseUrl[host] ? $parseUrl[host] : array_shift(explode('/', $parseUrl[path], 2))); } getHost("example.com"); // Gives example.com getHost("http://example.com"); // Gives example.com getHost("www.example.com"); // Gives www.example.com getHost("http://example.com/xyz"); // Gives example.com ?> You could try anything! It gives the host (including the subdomain if exists). Hope it helped you.

ap dot public1 at gmail dot com (2009-07-14 20:36:56)

Simple static library that allows easy manipulation of url parameters: <?php /** * File provides easy way to manipulate url parameters * @author Alexander Podgorny */ class Url { /** * Splits url into array of it's pieces as follows: * [scheme]://[user]:[pass]@[host]/[path]?[query]#[fragment] * In addition it adds 'query_params' key which contains array of * url-decoded key-value pairs * * @param String $sUrl Url * @return Array Parsed url pieces */ public static function explode($sUrl) { $aUrl = parse_url($sUrl); $aUrl['query_params'] = array(); $aPairs = explode('&', $aUrl['query']); DU::show($aPairs); foreach($aPairs as $sPair) { if (trim($sPair) == '') { continue; } list($sKey, $sValue) = explode('=', $sPair); $aUrl['query_params'][$sKey] = urldecode($sValue); } return $aUrl; } /** * Compiles url out of array of it's pieces (returned by explodeUrl) * 'query' is ignored if 'query_params' is present * * @param Array $aUrl Array of url pieces */ public static function implode($aUrl) { //[scheme]://[user]:[pass]@[host]/[path]?[query]#[fragment] $sQuery = ''; // Compile query if (isset($aUrl['query_params']) && is_array($aUrl['query_params'])) { $aPairs = array(); foreach ($aUrl['query_params'] as $sKey=>$sValue) { $aPairs[] = $sKey.'='.urlencode($sValue); } $sQuery = implode('&', $aPairs); } else { $sQuery = $aUrl['query']; } // Compile url $sUrl = $aUrl['scheme'] . '://' . ( isset($aUrl['user']) && $aUrl['user'] != '' && isset($aUrl['pass']) ? $aUrl['user'] . ':' . $aUrl['pass'] . '@' : '' ) . $aUrl['host'] . ( isset($aUrl['path']) && $aUrl['path'] != '' ? $aUrl['path'] : '' ) . ( $sQuery != '' ? '?' . $sQuery : '' ) . ( isset($aUrl['fragment']) && $aUrl['fragment'] != '' ? '#' . $aUrl['fragment'] : '' ); return $sUrl; } /** * Parses url and returns array of key-value pairs of url params * * @param String $sUrl * @return Array */ public static function getParams($sUrl) { $aUrl = self::explode($sUrl); return $aUrl['query_params']; } /** * Removes existing url params and sets them to those specified in $aParams * * @param String $sUrl Url * @param Array $aParams Array of Key-Value pairs to set url params to * @return String Newly compiled url */ public static function setParams($sUrl, $aParams) { $aUrl = self::explode($sUrl); $aUrl['query'] = ''; $aUrl['query_params'] = $aParams; return self::implode($aUrl); } /** * Updates values of existing url params and/or adds (if not set) those specified in $aParams * * @param String $sUrl Url * @param Array $aParams Array of Key-Value pairs to set url params to * @return String Newly compiled url */ public static function updateParams($sUrl, $aParams) { $aUrl = self::explode($sUrl); $aUrl['query'] = ''; $aUrl['query_params'] = array_merge($aUrl['query_params'], $aParams); return self::implode($aUrl); } } ?>

vuatintac at yahoo dot com (2009-04-22 19:54:58)

careful while use parse_str() function with full url. see: <?php $sLink = "http://localhost/khm/search.php?act=result&q=a"; parse_str($sLink, $vars); print_r($vars); echo $sLink = rawurldecode(http_build_query($vars)); // will output: // Array // ( // [http://localhost/khm/search_php?act] => result // [q] => a // ) // http://localhost/khm/search_php?act=result&q=a // search.php become search_php ?>

theoriginalmarksimpson at gmail dot com (2009-04-17 17:08:49)

An update to the function by FredLudd at gmail dot com. I added IPv6 functionality as well. <?php function j_parseUrl($url) { $r = "(?:([a-z0-9+-._]+)://)?"; $r .= "(?:"; $r .= "(?:((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9a-f]{2})*)@)?"; $r .= "(?:\[((?:[a-z0-9:])*)\])?"; $r .= "((?:[a-z0-9-._~!$&'()*+,;=]|%[0-9a-f]{2})*)"; $r .= "(?::(\d*))?"; $r .= "(/(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9a-f]{2})*)?"; $r .= "|"; $r .= "(/?"; $r .= "(?:[a-z0-9-._~!$&'()*+,;=:@]|%[0-9a-f]{2})+"; $r .= "(?:[a-z0-9-._~!$&'()*+,;=:@\/]|%[0-9a-f]{2})*"; $r .= ")?"; $r .= ")"; $r .= "(?:\?((?:[a-z0-9-._~!$&'()*+,;=:\/?@]|%[0-9a-f]{2})*))?"; $r .= "(?:#((?:[a-z0-9-._~!$&'()*+,;=:\/?@]|%[0-9a-f]{2})*))?"; preg_match("`$r`i", $url, $match); $parts = array( "scheme"=>'', "userinfo"=>'', "authority"=>'', "host"=> '', "port"=>'', "path"=>'', "query"=>'', "fragment"=>''); switch (count ($match)) { case 10: $parts['fragment'] = $match[9]; case 9: $parts['query'] = $match[8]; case 8: $parts['path'] = $match[7]; case 7: $parts['path'] = $match[6] . $parts['path']; case 6: $parts['port'] = $match[5]; case 5: $parts['host'] = $match[3]?"[".$match[3]."]":$match[4]; case 4: $parts['userinfo'] = $match[2]; case 3: $parts['scheme'] = $match[1]; } $parts['authority'] = ($parts['userinfo']?$parts['userinfo']."@":""). $parts['host']. ($parts['port']?":".$parts['port']:""); return $parts; } ?> When using the url /* line too long for this site's comment handler */ "foo://username:password@[2001:4860:0:2001::68]:8042". "/over/there/index.dtb;type=animal?name=ferret#nose" The original would return Array ( [scheme] => foo [userinfo] => username:password [authority] => username:password@ [host] => [port] => [path] => [query] => [fragment] => ) The new one returns Array ( [scheme] => foo [userinfo] => username:password [authority] => username:password@[2001:4860:0:2001::68]:8042 [host] => [2001:4860:0:2001::68] [port] => 8042 [path] => /over/there/index.dtb;type=animal [query] => name=ferret [fragment] => nose ) All of the other examples FredLudd used below still work exactly the same.

usrflo at gmx dot de (2009-04-12 05:05:39)

Additional information: if you like to compute a registered domain name of the parsed host name you can use the PHP library at http://www.dkim-reputation.org/regdom-libs/ It relies on the effective TLD list of the Mozilla Foundation.

pbcomm at gmail dot com (2008-10-25 21:55:17)

Modification to the code from: theoriginalmarksimpson at gmail dot com Change: $r .= "(?:(?P<login>\w+):(?P<pass>\w+)@)?"; Replace with: $r .= "(?:(?P<login>\w+):?(?P<pass>\w+)?@)?"; This will cover the case the only username is present in the url: http://username@subdomain.domain.com/index.php?arg1=test#anchor

vdklah at hotmail dot com (2008-10-17 02:53:54)

Some example that determines the URL port. When port not specified, it derives it from the scheme. <?php function getUrlPort( $urlInfo ) { if( isset($urlInfo['port']) ) { $port = $urlInfo['port']; } else { // no port specified; get default port if (isset($urlInfo['scheme']) ) { switch( $urlInfo['scheme'] ) { case 'http': $port = 80; // default for http break; case 'https': $port = 443; // default for https break; case 'ftp': $port = 21; // default for ftp break; case 'ftps': $port = 990; // default for ftps break; default: $port = 0; // error; unsupported scheme break; } } else { $port = 0; // error; unknown scheme } } return $port; } $url = "http://nl3.php.net/manual/en/function.parse-url.php"; $urlInfo = parse_url( $url ); $urlPort = getUrlPort( $urlInfo ); if( $urlPort !== 0 ) { print 'Found URL port: '.$urlPort; } else { print 'ERROR: Could not find port at URL: '.$url; } ?>

gautam at rogers dot com (2008-10-15 11:08:46)

What about using something like this to safely encoding all the values that are passed in the query portion? Example input: http://www.example.com/?first=john&last=smith&email=john@smith.com Result: http://www.example.com/?first=john&last=smith&email=john%40smith.com <?php function safe_url($url) { // Make sure we have a string to work with if(!empty($url)) { // Explode into URL keys $urllist=parse_url($url); // Make sure we have a valid result set and a query field if(is_array($urllist) && isset($urllist["query"])) { // Explode into key/value array $keyvalue_list=explode("&",($urllist["query"])); // Store resulting key/value pairs $keyvalue_result=array(); foreach($keyvalue_list as $key=>$value) { // Explode each individual key/value into an array $keyvalue=explode("=",$value); // Make sure we have a "key=value" array if(count($keyvalue)==2) { // Encode the value portion $keyvalue[1]=urlencode($keyvalue[1]); // Add our key and encoded value into the result array_push($keyvalue_result,implode("=",$keyvalue)); } } // Repopulate our query key with encoded results $urllist["query"]=implode("&",$keyvalue_result); // Build the the final output URL $url=(isset($urllist["scheme"])?$urllist["scheme"]."://":""). (isset($urllist["user"])?$urllist["user"].":":""). (isset($urllist["pass"])?$urllist["pass"]."@":""). (isset($urllist["host"])?$urllist["host"]:""). (isset($urllist["port"])?":".$urllist["port"]:""). (isset($urllist["path"])?$urllist["path"]:""). (isset($urllist["query"])?"?".$urllist["query"]:""). (isset($urllist["fragment"])?"#".$urllist["fragment"]:""); } } return $url; } ?>

FredLudd at gmail dot com (2008-10-03 11:24:42)

Another shot at trying to find a better parser. I noticed that the laulibrius/theoriginalmarksimpson functions didn't quite handle the URL for the page they were displayed on. For my mirror, ca3, this is http://ca3.php.net/manual/en/function.parse-url.php Run it through the function and it parses to scheme => http login => pass => host => ca3.php.net ip => subdomain => ca3 domain => php. extension => net port => path => /manual/en/function.parse file => function.parse that is, the file name gets a bit mangled Rather than tweak the function's regular expression yet again, I opted to adapt a RegExp that served me well in Javascript: function j_parseUrl($url) { $r = "(?:([a-z0-9+-._]+)://)?"; $r .= "(?:"; $r .= "(?:((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9a-f]{2})*)@)?"; $r .= "((?:[a-z0-9-._~!$&'()*+,;=]|%[0-9a-f]{2})*)"; $r .= "(?::(\d*))?"; $r .= "(/(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9a-f]{2})*)?"; $r .= "|"; $r .= "(/?"; $r .= "(?:[a-z0-9-._~!$&'()*+,;=:@]|%[0-9a-f]{2})+"; $r .= "(?:[a-z0-9-._~!$&'()*+,;=:@\/]|%[0-9a-f]{2})*"; $r .= ")?"; $r .= ")"; $r .= "(?:\?((?:[a-z0-9-._~!$&'()*+,;=:\/?@]|%[0-9a-f]{2})*))?"; $r .= "(?:#((?:[a-z0-9-._~!$&'()*+,;=:\/?@]|%[0-9a-f]{2})*))?"; preg_match("`$r`i", $url, $match); $parts = array( "scheme"=>'', "userinfo"=>'', "authority"=>'', "host"=> '', "port"=>'', "path"=>'', "query"=>'', "fragment"=>''); switch (count ($match)) { case 9: $parts['fragment'] = $match[8]; case 8: $parts['query'] = $match[7]; case 7: $parts['path'] = $match[6]; case 6: $parts['path'] = $match[5] . $parts['path']; case 5: $parts['port'] = $match[4]; case 4: $parts['host'] = $match[3]; case 3: $parts['userinfo'] = $match[2]; case 2: $parts['scheme'] = $match[1]; } $parts['authority'] = ($parts['userinfo']?$parts['userinfo']."@":""). $parts['host']. ($parts['port']?":".$parts['port']:""); return $parts; } This function, when fed "http://ca3.php.net/manual/en/function.parse-url.php", returns scheme => http userinfo => authority => ca3.php.net host => ca3.php.net port => path => /manual/en/function.parse-url.php query => fragment => which is somewhat closer to my needs. But everything should be tested against the two examples provided by RFC3986, /* line too long for this site's commnet handler */ "foo://username:password@example.com:8042". "/over/there/index.dtb;type=animal?name=ferret#nose" and "urn:example:animal:ferret:nose" Here the native function parse_url() performs admirably on that "urn:" example. Mine fails to pick out the path ("example:animal:ferret:nose") and the laulibrius/theoriginalmarksimpson function can't decipher anything there. On the "foo:" example, both my function and parse_url() get it right, while the other examples on this page don't. The laulibrius/theoriginalmarksimpson function delivers scheme => foo login => username pass => password host => example.com ip => subdomain => domain => example. extension => com port => 8042 path => /over/there/index.dtb file => index.dtb As you can see, the query string ("name=ferret") and fragment ("nose") have dropped off, as well as the parameter ("type=animal").

xellisx (2008-10-01 08:37:42)

I need to parse out the query string from the referrer, so I created this function. <?php function parse_query($val) { /** * Use this function to parse out the query array element from * the output of parse_url(). */ $var = html_entity_decode($var); $var = explode('&', $var); $arr = array(); foreach($var as $val) { $x = explode('=', $val); $arr[$x[0]] = $x[1]; } unset($val, $x, $var); return $arr; } ?>

ilja at radusch dot com (2008-09-26 04:26:50)

Here is an update to the glue_url() function. It can now handle relative URLs if only 'path' is provided. <?php function glue_url($parsed) { if (!is_array($parsed)) { return false; } $uri = isset($parsed['scheme']) ? $parsed['scheme'].':'.((strtolower($parsed['scheme']) == 'mailto') ? '' : '//') : ''; $uri .= isset($parsed['user']) ? $parsed['user'].(isset($parsed['pass']) ? ':'.$parsed['pass'] : '').'@' : ''; $uri .= isset($parsed['host']) ? $parsed['host'] : ''; $uri .= isset($parsed['port']) ? ':'.$parsed['port'] : ''; if (isset($parsed['path'])) { $uri .= (substr($parsed['path'], 0, 1) == '/') ? $parsed['path'] : ((!empty($uri) ? '/' : '' ) . $parsed['path']); } $uri .= isset($parsed['query']) ? '?'.$parsed['query'] : ''; $uri .= isset($parsed['fragment']) ? '#'.$parsed['fragment'] : ''; return $uri; } ?>

nospam at spellingcow dot com (2008-09-08 14:03:58)

URL's in the query string of a relative URL will cause a problem fails: /page.php?foo=bar&url=http://www.example.com parses: http://www.foo.com/page.php?foo=bar&url=http://www.example.com

theoriginalmarksimpson at gmail dot com (2008-09-05 00:12:11)

A rehash of code modified by "laulibrius at hotmail dot com". This also parses urls for hosts that don't have a domain name and just use an IP as the hostname. The old code would assume that the IP octets were a subdomain. So the url "http://255.255.255.255/" would return 255.255 as a subdomain of 255.255. <?php parseUrl($url) { $r = "^(?:(?P<scheme>\w+)://)?"; $r .= "(?:(?P<login>\w+):(?P<pass>\w+)@)?"; $ip="(?:[0-9]{1,3}+\.){3}+[0-9]{1,3}";//ip check $s="(?P<subdomain>[-\w\.]+)\.)?";//subdomain $d="(?P<domain>[-\w]+\.)";//domain $e="(?P<extension>\w+)";//extension $r.="(?P<host>(?(?=".$ip.")(?P<ip>".$ip.")|(?:".$s.$d.$e."))"; $r .= "(?::(?P<port>\d+))?"; $r .= "(?P<path>[\w/]*/(?P<file>\w+(?:\.\w+)?)?)?"; $r .= "(?:\?(?P<arg>[\w=&]+))?"; $r .= "(?:#(?P<anchor>\w+))?"; $r = "!$r!"; // Delimiters preg_match($r, $url,$out); } ?> If you need to validate the host IP this is easier than using regex. <?php $parsed=parseUrl($url); if($parsed['ip']) { if(long2ip(ip2long($parsed['ip']))==$parsed['ip']){//validates IP echo $parsed['ip']." is a valid host"; } else { echo $parsed['ip']." is not a valid host"; } } ?>

marco panichi (2008-08-23 01:47:18)

my function catch the url written on the browser by the user and does the same thing of parse_url. but better, I think. I don't like parse_url because it says nothing about elements that it doesn't find in the url. my function instead return an empty string. <?php function get_url() { $arr = array(); $uri = $_SERVER['REQUEST_URI']; // query $x = array_pad( explode( '?', $uri ), 2, false ); $arr['query'] = ( $x[1] )? $x[1] : '' ; // resource $x = array_pad( explode( '/', $x[0] ), 2, false ); $x_last = array_pop( $x ); if( strpos( $x_last, '.' ) === false ) { $arr['resource'] = ''; $x[] = $x_last; } else { $arr['resource'] = $x_last; } // path $arr['path'] = implode( '/', $x ); if( substr( $arr['path'], -1 ) !== '/' ) $arr['path'] .= '/'; // domain $arr['domain'] = $_SERVER['SERVER_NAME']; // scheme $server_prt = explode( '/', $_SERVER['SERVER_PROTOCOL'] ); $arr['scheme'] = strtolower( $server_prt[0] ); // url $arr['url'] = $arr['scheme'].'://'.$arr['domain'].$uri; return $arr; } ?> PS: I found working with explode is faster than using preg_match (I tryed with getmicrotime function and 'for' cycles). PPS: I used array_pad to prevent any notice.

andrewtheartist at hotmail dot com (2008-06-28 17:28:58)

Here's the easiest way to get the URL to the path that your script is in (so not the actual script name itself, just the complete URL to the folder it's in) echo "http://".$_SERVER['HTTP_HOST'].dirname($_SERVER['PHP_SELF']);

Cool Coyote (2008-06-23 05:35:22)

based on the "laulibrius at hotmail dot com" function, this work for relatives url only: <?php function parseUrl($url) { $r = "^(?:(?P<path>[\.\w/]*/)?"; $r .= "(?P<file>\w+(?:\.\w+)?)?)\.(?P<extension>\w+)?"; $r .= "(?:\?(?P<arg>[\w=&]+))?"; $r .= "(?:#(?P<anchor>\w+))?"; $r = "!$r!"; preg_match ( $r, $url, $out ); return $out; } print_r(parseUrl("../test/f.aq.php?p=1&v=blabla#X1")); ?> returns: Array ( [0] => ../test/faq.php?p=1&v=blabla#X1 [path] => ../test/ [1] => ../test/ [file] => faq [2] => faq [extension] => php [3] => php [arg] => p=1&v=blabla [4] => p=1&v=blabla [anchor] => X1 [5] => X1 )

laulibrius at hotmail dot com (2008-06-16 09:31:48)

There was one thing missing in the function dropped by "to1ne at hotmail dot com" when i tried it : domain and subdomain couldn't have a dash "-". So i add it in the regexp and the function looks like this now : <?php function parseUrl($url) { $r = "^(?:(?P<scheme>\w+)://)?"; $r .= "(?:(?P<login>\w+):(?P<pass>\w+)@)?"; $r .= "(?P<host>(?:(?P<subdomain>[-\w\.]+)\.)?" . "(?P<domain>[-\w]+\.(?P<extension>\w+)))"; $r .= "(?::(?P<port>\d+))?"; $r .= "(?P<path>[\w/]*/(?P<file>\w+(?:\.\w+)?)?)?"; $r .= "(?:\?(?P<arg>[\w=&]+))?"; $r .= "(?:#(?P<anchor>\w+))?"; $r = "!$r!"; // Delimiters preg_match ( $r, $url, $out ); return $out; } ?> Btw, thanks for the function, it helps me a lot.

to1ne at hotmail dot com (2008-06-13 11:01:45)

Based on the idea of "jbr at ya-right dot com" have I been working on a new function to parse the url: <?php function parseUrl($url) { $r = "^(?:(?P<scheme>\w+)://)?"; $r .= "(?:(?P<login>\w+):(?P<pass>\w+)@)?"; $r .= "(?P<host>(?:(?P<subdomain>[\w\.]+)\.)?" . "(?P<domain>\w+\.(?P<extension>\w+)))"; $r .= "(?::(?P<port>\d+))?"; $r .= "(?P<path>[\w/]*/(?P<file>\w+(?:\.\w+)?)?)?"; $r .= "(?:\?(?P<arg>[\w=&]+))?"; $r .= "(?:#(?P<anchor>\w+))?"; $r = "!$r!"; // Delimiters preg_match ( $r, $url, $out ); return $out; } print_r ( parseUrl ( 'me:you@sub.site.org:29000/pear/validate.html?happy=me&sad=you#url' ) ); ?> This returns: Array ( [0] => me:you@sub.site.org:29000/pear/validate.html?happy=me&sad=you#url [scheme] => [1] => [login] => me [2] => me [pass] => you [3] => you [host] => sub.site.org [4] => sub.site.org [subdomain] => sub [5] => sub [domain] => site.org [6] => site.org [extension] => org [7] => org [port] => 29000 [8] => 29000 [path] => /pear/validate.html [9] => /pear/validate.html [file] => validate.html [10] => validate.html [arg] => happy=me&sad=you [11] => happy=me&sad=you [anchor] => url [12] => url ) So both named and numbered array keys are possible. It's quite advanced, but I think it works in any case... Let me know if it doesn't...

jbr at ya-right dot com (2008-05-02 20:24:27)

This function never works the way you think it should... Example.... <?php print_r ( parse_url ( 'me:you@sub.site.org/pear/validate.html?happy=me&sad=you#url' ) ); ?> Returns... Array ( [scheme] => me [path] => you@sub.site.org/pear/validate.html [query] => happy=me&sad=you [fragment] => url ) Here my way of doing parse_url <?php function parseUrl ( $url ) { $r = '!(?:(\w+)://)?(?:(\w+)\:(\w+)@)?([^/:]+)?'; $r .= '(?:\:(\d*))?([^#?]+)?(?:\?([^#]+))?(?:#(.+$))?!i'; preg_match ( $r, $url, $out ); return $out; } print_r ( parseUrl ( 'me:you@sub.site.org/pear/validate.html?happy=me&sad=you#url' ) ); ?> Returns... Array ( [0] => me:you@sub.site.org/pear/validate.html?happy=me&sad=you#url [1] => [2] => me [3] => you [4] => sub.site.org [5] => [6] => /pear/validate.html [7] => happy=me&sad=you [8] => url ) Where as... out[0] = full url out[1] = scheme or '' if no scheme was found out[2] = username or '' if no auth username was found out[3] = password or '' if no auth password was found out[4] = domain name or '' if no domain name was found out[5] = port number or '' if no port number was found out[6] = path or '' if no path was found out[7] = query or '' if no query was found out[8] = fragment or '' if no fragment was found

Nicolas Merlet - admin(at)merletn.org (2008-03-14 07:05:24)

Please note that parse_url seems not to produce always the same results when passing non-standard urls. Eg. I was using this code since 2005 (both under PHP 4.3.10 and PHP 5.2.3) : <?php $p = parse_url ( 'http://domain.tld/tcp://domain2.tld/dir/file' ) ; $d2 = parse_url ( $p['path'] ) ; echo $d2 ; // returns '/dir/file' ?> Of course my example is very specific, as URL is not really correct. But using parse_url was a great trick to split URL easily (without using regular expressions). Unfortunately under PHP 5.2.0-8 (+etch10), parse_url will fail as it does not accept the slash (/) at the beginning of URL. Here is a possible patch : <?php $p = parse_url ( 'http://domain.tld/tcp://domain2.tld/dir/file' ) ; $d2 = parse_url ( substr ( $p['path'] , 1 ) ) ; echo $d2 ; // returns '/dir/file' ?> However this last code is not optimized at all, and should be replaced by a regular expression to split URL (so that parse_url would be no longer used). So you should use parse_url very carefully, and verify that you pass only standard URLs...

Nick Smith (2007-09-05 02:32:28)

Note that older versions of PHP (e.g., 4.1) returned an blank string as the path for URLs without any path, such as http://www.php.net However more recent versions of PHP (e.g., 4.4.7) don't set the path element in the array, so old code will get a PHP warning about an undefined index.

Michael Muryn (2007-08-27 08:51:08)

Another update to the glue_url function: applied the "isset" treatment to $parsed['pass']. <?php function glue_url($parsed) { if (!is_array($parsed)) return false; $uri = isset($parsed['scheme']) ? $parsed['scheme'].':'.((strtolower($parsed['scheme']) == 'mailto') ? '' : '//') : ''; $uri .= isset($parsed['user']) ? $parsed['user'].(isset($parsed['pass']) ? ':'.$parsed['pass'] : '').'@' : ''; $uri .= isset($parsed['host']) ? $parsed['host'] : ''; $uri .= isset($parsed['port']) ? ':'.$parsed['port'] : ''; if(isset($parsed['path'])) { $uri .= (substr($parsed['path'], 0, 1) == '/') ? $parsed['path'] : ('/'.$parsed['path']); } $uri .= isset($parsed['query']) ? '?'.$parsed['query'] : ''; $uri .= isset($parsed['fragment']) ? '#'.$parsed['fragment'] : ''; return $uri; } ?>

stevenlewis at hotmail dot com (2007-08-13 03:08:48)

an update to the glue url function. you are able to put a host and a path without a slash at the beginning of the path <?php function glue_url($parsed) { if (! is_array($parsed)) return false; $uri = isset($parsed['scheme']) ? $parsed['scheme'].':'.((strtolower($parsed['scheme']) == 'mailto') ? '':'//'): ''; $uri .= isset($parsed['user']) ? $parsed['user'].($parsed['pass']? ':'.$parsed['pass']:'').'@':''; $uri .= isset($parsed['host']) ? $parsed['host'] : ''; $uri .= isset($parsed['port']) ? ':'.$parsed['port'] : ''; if(isset($parsed['path'])) { $uri .= (substr($parsed['path'],0,1) == '/')?$parsed['path']:'/'.$parsed['path']; } $uri .= isset($parsed['query']) ? '?'.$parsed['query'] : ''; $uri .= isset($parsed['fragment']) ? '#'.$parsed['fragment'] : ''; return $uri; } ?>

spam at paulisageek dot com (2007-08-08 12:05:17)

In reply to adrian, Thank you very much for your function. There is a small issue with your relative protocol function. You need to remove the // when making the url the path. Here is the new function. function resolve_url($base, $url) { if (!strlen($base)) return $url; // Step 2 if (!strlen($url)) return $base; // Step 3 if (preg_match('!^[a-z]+:!i', $url)) return $url; $base = parse_url($base); if ($url{0} == "#") { // Step 2 (fragment) $base['fragment'] = substr($url, 1); return unparse_url($base); } unset($base['fragment']); unset($base['query']); if (substr($url, 0, 2) == "//") { // Step 4 return unparse_url(array( 'scheme'=>$base['scheme'], 'path'=>substr($url,2), )); } else if ($url{0} == "/") { // Step 5 $base['path'] = $url; } else { // Step 6 $path = explode('/', $base['path']); $url_path = explode('/', $url); // Step 6a: drop file from base array_pop($path); // Step 6b, 6c, 6e: append url while removing "." and ".." from // the directory portion $end = array_pop($url_path); foreach ($url_path as $segment) { if ($segment == '.') { // skip } else if ($segment == '..' && $path && $path[sizeof($path)-1] != '..') { array_pop($path); } else { $path[] = $segment; } } // Step 6d, 6f: remove "." and ".." from file portion if ($end == '.') { $path[] = ''; } else if ($end == '..' && $path && $path[sizeof($path)-1] != '..') { $path[sizeof($path)-1] = ''; } else { $path[] = $end; } // Step 6h $base['path'] = join('/', $path); } // Step 7 return unparse_url($base); }

christian at resource-it dot dk (2007-08-03 12:57:19)

I searched for an implementation of rfc3986, which is a newer version of rfc 2392. I may find it here : <http://www.chrsen.dk/fundanemt/files/scripter/php/misc/rfc3986.php> - read the rfc at <http://rfc.net/rfc3986.html>

adrian-php at sixfingeredman dot net (2007-07-25 14:58:40)

Here's a function which implements resolving a relative URL according to RFC 2396 section 5.2. No doubt there are more efficient implementations, but this one tries to remain close to the standard for clarity. It relies on a function called "unparse_url" to implement section 7, left as an exercise for the reader (or you can substitute the "glue_url" function posted earlier). <?php /** * Resolve a URL relative to a base path. This happens to work with POSIX * filenames as well. This is based on RFC 2396 section 5.2. */ function resolve_url($base, $url) { if (!strlen($base)) return $url; // Step 2 if (!strlen($url)) return $base; // Step 3 if (preg_match('!^[a-z]+:!i', $url)) return $url; $base = parse_url($base); if ($url{0} == "#") { // Step 2 (fragment) $base['fragment'] = substr($url, 1); return unparse_url($base); } unset($base['fragment']); unset($base['query']); if (substr($url, 0, 2) == "//") { // Step 4 return unparse_url(array( 'scheme'=>$base['scheme'], 'path'=>$url, )); } else if ($url{0} == "/") { // Step 5 $base['path'] = $url; } else { // Step 6 $path = explode('/', $base['path']); $url_path = explode('/', $url); // Step 6a: drop file from base array_pop($path); // Step 6b, 6c, 6e: append url while removing "." and ".." from // the directory portion $end = array_pop($url_path); foreach ($url_path as $segment) { if ($segment == '.') { // skip } else if ($segment == '..' && $path && $path[sizeof($path)-1] != '..') { array_pop($path); } else { $path[] = $segment; } } // Step 6d, 6f: remove "." and ".." from file portion if ($end == '.') { $path[] = ''; } else if ($end == '..' && $path && $path[sizeof($path)-1] != '..') { $path[sizeof($path)-1] = ''; } else { $path[] = $end; } // Step 6h $base['path'] = join('/', $path); } // Step 7 return unparse_url($base); } ?>

Antti Haapala (2007-07-17 01:42:01)

Actually the behaviour noticed by the previous poster is quite correct. When the URI scheme is not present, it is plain wrong to assume that something starting with www. is a domain name, and that the scheme is HTTP. Internet Explorer does it that way, sure, but it does not make it any more correct. The documentation says that the function tries to decode the URL as well as it can, and the only sensible and standards-compliant way to decode such URL is to expect it to be a relative URI.

Elliott Brueggeman (2007-06-03 15:59:52)

Note that if you pass this function a url without a scheme (www.php.net, as opposed to http://www.php.net), the function will incorrectly parse the results. In my test case it returned the domain under the ['path'] element and nothing in the ['host'] element.

Marc-Antoine Ross (2007-03-14 08:10:13)

Do not look for the fragment in $_SERVER['QUERY_STRING'], you will not find it. You should read the fragment in JavaScript for example.

alistair at 21degrees dot com dot au (2006-10-23 19:21:24)

Heres a simple function to add the $component option in for PHP4. Haven't done exhaustive testing, but should work ok. <?php ## Defines only available in PHP 5, created for PHP4 if(!defined('PHP_URL_SCHEME')) define('PHP_URL_SCHEME', 1); if(!defined('PHP_URL_HOST')) define('PHP_URL_HOST', 2); if(!defined('PHP_URL_PORT')) define('PHP_URL_PORT', 3); if(!defined('PHP_URL_USER')) define('PHP_URL_USER', 4); if(!defined('PHP_URL_PASS')) define('PHP_URL_PASS', 5); if(!defined('PHP_URL_PATH')) define('PHP_URL_PATH', 6); if(!defined('PHP_URL_QUERY')) define('PHP_URL_QUERY', 7); if(!defined('PHP_URL_FRAGMENT')) define('PHP_URL_FRAGMENT', 8); function parse_url_compat($url, $component=NULL){ if(!$component) return parse_url($url); ## PHP 5 if(phpversion() >= 5) return parse_url($url, $component); ## PHP 4 $bits = parse_url($url); switch($component){ case PHP_URL_SCHEME: return $bits['scheme']; case PHP_URL_HOST: return $bits['host']; case PHP_URL_PORT: return $bits['port']; case PHP_URL_USER: return $bits['user']; case PHP_URL_PASS: return $bits['pass']; case PHP_URL_PATH: return $bits['path']; case PHP_URL_QUERY: return $bits['query']; case PHP_URL_FRAGMENT: return $bits['fragment']; } } ?>

TheShadow (2004-12-30 12:36:16)

You may want to check out the PEAR NET_URL class. It provides easy means to manipulate URL strings. http://pear.php.net/package/Net_URL

parse_url

说明

参数

返回值

更新日志

范例

注释

参见

用户评论: