正则表达式是一个特殊的字符序列可以帮助匹配或者找到其他字符串或串套,使用的模式保持一个专门的语法。
正则表达式文本是一个模式之间的斜线之间或任意分隔符 %r 如下:
语法:
/pattern/ /pattern/im # option can be specified %r!/usr/local! # general delimited regular expression
例如:
#!/usr/bin/ruby line1 = "Cats are smarter than dogs"; line2 = "Dogs also like meat"; if ( line1 =~ /Cats(.*)/ ) puts "Line1 starts with Cats" end if ( line2 =~ /Cats(.*)/ ) puts "Line2 starts with Dogs" end
这将产生以下结果:
Line1 starts with Cats
正则表达式修饰符:
正则表达式的文字可以包括一个可选的修饰符来控制各方面的匹配。修改指定第二个斜杠字符后,如前面所示,可表示为这些字符之一:
修饰符 | 描述 |
---|---|
i | Ignore case when matching text. |
o | Perform #{} interpolations only once, the first time the regexp literal is evaluated. |
x | Ignores whitespace and allows comments in regular expressions |
m | Matches multiple lines, recognizing newlines as normal characters |
u,e,s,n | Interpret the regexp as Unicode (UTF-8), EUC, SJIS, or ASCII. If none of these modifiers is specified, the regular expression is assumed to use the source encoding. |
%Q分隔字符串文字一样,Ruby允许正则表达式带 %r,然后由所选择的定界符。这是非常有用的,当所描述的模式中包含正斜杠字符不希望转义:
# Following matches a single slash character, no escape required %r|/| # Flag characters are allowed with this syntax, too %r[</(.*)>]i
正则表达式模式:
除控制字符, (+ ? . * ^ $ ( ) [ ] { } | ), 所有字符匹配。可以转义控制字符前面加上反斜线。
下表列出了可在Ruby的正则表达式语法。
模式 | 描述 |
---|---|
^ | Matches beginning of line. |
$ | Matches end of line. |
. | Matches any single character except newline. Using m option allows it to match newline as well. |
[...] | Matches any single character in brackets. |
[^...] | Matches any single character not in brackets |
re* | Matches 0 or more occurrences of preceding expression. |
re+ | Matches 1 or more occurrence of preceding expression. |
re? | Matches 0 or 1 occurrence of preceding expression. |
re{ n} | Matches exactly n number of occurrences of preceding expression. |
re{ n,} | Matches n or more occurrences of preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of preceding expression. |
a| b | Matches either a or b. |
(re) | Groups regular expressions and remembers matched text. |
(?imx) | Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
(?-imx) | Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
(?: re) | Groups regular expressions without remembering matched text. |
(?imx: re) | Temporarily toggles on i, m, or x options within parentheses. |
(?-imx: re) | Temporarily toggles off i, m, or x options within parentheses. |
(?#...) | Comment. |
(?= re) | Specifies position using a pattern. Doesn't have a range. |
(?! re) | Specifies position using pattern negation. Doesn't have a range. |
(?> re) | Matches independent pattern without backtracking. |
w | Matches word characters. |
W | Matches nonword characters. |
s | Matches whitespace. Equivalent to [ f]. |
S | Matches nonwhitespace. |
d | Matches digits. Equivalent to [0-9]. |
D | Matches nondigits. |
A | Matches beginning of string. |
Matches end of string. If a newline exists, it matches just before newline. | |
z | Matches end of string. |
G | Matches yiibai where last match finished. |
Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets. | |
B | Matches nonword boundaries. |
, , etc. | Matches newlines, carriage returns, tabs, etc. |
1...9 | Matches nth grouped subexpression. |
10 | Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code. |
正则表达式的例子:
文字字符:
例子 | 描述 |
---|---|
/ruby/ | Match "ruby". |
¥ | Matches Yen sign. Multibyte characters are suported in Ruby 1.9 and Ruby 1.8. |
字符类:
例子 | 描述 |
---|---|
/[Rr]uby/ | Match "Ruby" or "ruby" |
/rub[ye]/ | Match "ruby" or "rube" |
/[aeiou]/ | Match any one lowercase vowel |
/[0-9]/ | Match any digit; same as /[0123456789]/ |
/[a-z]/ | Match any lowercase ASCII letter |
/[A-Z]/ | Match any uppercase ASCII letter |
/[a-zA-Z0-9]/ | Match any of the above |
/[^aeiou]/ | Match anything other than a lowercase vowel |
/[^0-9]/ | Match anything other than a digit |
特殊字符类:
例子 | 描述 |
---|---|
/./ | Match any character except newline |
/./m | In multiline mode . matches newline, too |
/d/ | Match a digit: /[0-9]/ |
/D/ | Match a nondigit: /[^0-9]/ |
/s/ | Match a whitespace character: /[ f]/ |
/S/ | Match nonwhitespace: /[^ f]/ |
/w/ | Match a single word character: /[A-Za-z0-9_]/ |
/W/ | Match a nonword character: /[^A-Za-z0-9_]/ |
重复例:
例子 | 描述 |
---|---|
/ruby?/ | Match "rub" or "ruby": the y is optional |
/ruby*/ | Match "rub" plus 0 or more ys |
/ruby+/ | Match "rub" plus 1 or more ys |
/d{3}/ | Match exactly 3 digits |
/d{3,}/ | Match 3 or more digits |
/d{3,5}/ | Match 3, 4, or 5 digits |
不再是无限制的重复:
此相匹配的最小的重复次数:
例子 | 描述 |
---|---|
/<.*>/ | Greedy repetition: matches "<ruby>perl>" |
/<.*?>/ | Nongreedy: matches "<ruby>" in "<ruby>perl>" |
用括号分组:
例子 | 描述 |
---|---|
/Dd+/ | No group: + repeats d |
/(Dd)+/ | Grouped: + repeats Dd pair |
/([Rr]uby(, )?)+/ | Match "Ruby", "Ruby, ruby, ruby", etc. |
反向引用:
这再次匹配先前匹配的组:
例子 | 描述 |
---|---|
/([Rr])uby&1ails/ | Match ruby&rails or Ruby&Rails |
/(['"])(?:(?!1).)*1/ | Single or double-quoted string. 1 matches whatever the 1st group matched . 2 matches whatever the 2nd group matched, etc. |
替代品:
例子 | 描述 |
---|---|
/ruby|rube/ | Match "ruby" or "rube" |
/rub(y|le))/ | Match "ruby" or "ruble" |
/ruby(!+|?)/ | "ruby" followed by one or more ! or one ? |
锚:
这需要指定匹配位置
例子 | 描述 |
---|---|
/^Ruby/ | Match "Ruby" at the start of a string or internal line |
/Ruby$/ | Match "Ruby" at the end of a string or line |
/ARuby/ | Match "Ruby" at the start of a string |
/Ruby/ | Match "Ruby" at the end of a string |
/Ruby/ | Match "Ruby" at a word boundary |
/rubB/ | B is nonword boundary: match "rub" in "rube" and "ruby" but not alone |
/Ruby(?=!)/ | Match "Ruby", if followed by an exclamation yiibai |
/Ruby(?!!)/ | Match "Ruby", if not followed by an exclamation yiibai |
特别用括号语法:
例子 | 描述 |
---|---|
/R(?#comment)/ | Matches "R". All the rest is a comment |
/R(?i)uby/ | Case-insensitive while matching "uby" |
/R(?i:uby)/ | Same as above |
/rub(?:y|le))/ | Group only without creating 1 backreference |
搜索和替换:
String方法最重要的,使用正则表达式sub 和 gsub,他们就地变种sub! 和 gsub!
所有这些方法执行搜索和替换操作过程中使用一个正则表达式模式。sub & sub!替换第一次出现的模式 gsub & gsub!替换所有出现。
sub! 和 gsub! 返回一个新的字符串,未经修改的原始 sub 和 gsub 他们被称为修改字符串。
下面的例子:
#!/usr/bin/ruby phone = "2004-959-559 #This is Phone Number" # Delete Ruby-style comments phone = phone.sub!(/#.*$/, "") puts "Phone Num : #{phone}" # Remove anything other than digits phone = phone.gsub!(/D/, "") puts "Phone Num : #{phone}"
这将产生以下结果:
Phone Num : 2004-959-559 Phone Num : 2004959559
下面是另一个例子:
#!/usr/bin/ruby text = "rails are rails, really good Ruby on Rails" # Change "rails" to "Rails" throughout text.gsub!("rails", "Rails") # Capitalize the word "Rails" throughout text.gsub!(/rails/, "Rails") puts "#{text}"
这将产生以下结果:
Rails are Rails, really good Ruby on Rails