Capturing Groups
Capturing Groups (正在捕获组)
Now we are going to cover another useful feature of JavaScript regular expressions: capturing groups, allowing to capture parts of a string, putting them into an array.
It has two primary effects:
Allows getting a part of the match as a separate item in the result array. In case of putting a quantifier after the parentheses, it applies to the latter, as a whole. (允许将匹配的一部分作为结果数组中的单独项获取。 如果在括号后面加一个量词,它作为一个整体适用于后者。)
Examples of Using Parentheses
Examples of Using Parentheses (使用括号的示例)
Now, let’s see how parentheses operate. (现在,让我们看看括号是如何工作的。)
Imagine, you have an example “dododo”. (想象一下,你有一个例子“渡渡鸟”。)
Without using parentheses, the pattern do+ means d character, followed by o and repeated one or more times. for example doooo or dooooooooo. (在不使用括号的情况下,模式do +表示d字符,后跟o并重复一次或多次。例如doooo或dooooooo。)
With the help of parentheses characters are grouped together, so (do)+ considers do, dodo, dododo, like in the example below:
console.log('Dododo'.match(/(do)+/i)); // "Dododo"
Domain
Now, let’s try to look for a website domain using a regular expression. (现在,让我们尝试使用正则表达式查找网站域。)
For instance:
email.com
users.email.com
roberts.users.email.com
So, the domain here consists of repeated words, and with a dot after each of them except the last one. (因此,这里的域由重复的单词组成,每个单词后面都有一个点,最后一个除外。)
It is (\w+.)+\w+ in regular expressions:
let regexp = /(\w+\.)+\w+/g;
console.log("email.com my.email.com".match(regexp)); // email.com,my.email.com
The search is done, but the pattern is not capable of matching a domain with a hyphen, as it doesn’t belong to the \w class. (搜索已完成,但模式无法将域与连字符匹配,因为它不属于\ w类。)
It can be fixed by replacing \w with [\w-] in each word except for the last one: ([\w-]+.)+\w+.
Let’s create a regular expression for emails, based on the previous example. The format of the email is name@domain. A random word can be the name, hyphens and dots are also available. In regexp, it will look like this: [-.\w]+.
The pattern will be as follows:
let regexp = /[-.\w]+@([\w-]+\.)+[\w-]+/g;
console.log("[email protected] @ [email protected]".match(regexp)); // [email protected], [email protected]
This regexp mostly works, helping to fix accidental mistypes. (此正则表达式主要有效,有助于修复意外错误类型。)
Parentheses Contests in the Match
Parentheses Contests in the Match (比赛中的括号比赛)
It is necessary to count parentheses from left to right. The engine remembers the content that was matched by each, allowing to get it in the result. (必须从左到右计算括号。引擎会记住每个匹配的内容,从而将其纳入结果。)
The str.match(regexp) method searches for the first match, returning that as an array (if the regexp doesn’t have a flag g):
At the 0 index: the full match. At the 1 index: the contents of the initial parentheses. At the 2 index: the contents of the second parentheses.
Let’s consider finding HTML tags and proceeding them, as an example. (让我们考虑查找HTML标记并继续它们作为示例。)
As a first step, you should wrap the content into parentheses, as follows: <(.*?)>.
So, you will get both the tag <p> as a whole and the contents p in the resulting array, like this:
let str = '<p>Welcome to w3cdoc</p>';
let tag = str.match(/<(.*?)>/);
alert(tag[0]); // <p>
alert(tag[1]); // p
Nested Groups
Parentheses might be nested. In that case, the numbering goes from left to right, too. (括号可能是嵌套的。在这种情况下,编号也从左到右。)
Once you search for a tag in <p class=“myClass”>, you should be interested in the whole tag content (p class=“myClass”), the tag name (p), and the tag attributes (class=“myClass”).
Adding parentheses to them will look like this: <(([a-z]+)\s*([^>]*))>
The action will be as follows:
let str = '<p class="myClass">';
let regexp = /<(([a-z]+)\s*([^>]*))>/;
let res = str.match(regexp);
alert(res[0]); // <span class="myClass">
alert(res[1]); // span class="myClass"
alert(res[2]); // p
alert(res[3]); // class="myClass"
As a rule, the zero index of the result keeps the full match. (通常,结果的零索引会保持完全匹配。)
The initial group will be returned as res[1]. It encloses the tag content as a whole. (初始组将作为res [1]返回。它将标签内容作为一个整体包含在内。)
Afterward, in the res[2] group comes the group from the second opening paren ([a-z]+)- the name of the tag, and then the tag in the res[3]:([^>]*).
Optional Groups
Even in case of optional groups that don’t exist in the match, the corresponding result array item is there and equals to undefined. (即使在匹配中不存在可选组的情况下,相应的结果数组项也存在,并且等于undefined。)
For example, let’s try to apply the a(z)?(c)? regular expression. In case of running it on the string with one letter a, the result will look like this:
let m = 'a'.match(/a(z)?(c)?/);
console.log(m.length); // 3
console.log(m[0]); // a (whole match)
console.log(m[1]); // undefined
console.log(m[2]); // undefined
The length of the array is 3, but all the groups are empty. (数组的长度为3 ,但所有组均为空。)
Searching for All Matches:matchAll
Searching for All Matches:matchAll
First of all let’s note that matchAll is a new method, and is not supported by old browsers. That’s why a polyfill may be required. (首先,我们要注意, matchAll是一种新方法,旧浏览器不支持。这就是为什么可能需要填充聚合物。)
While searching for all matches (g flag), the match method can’t return contents for all the groups. (在搜索所有匹配项( g标志)时,匹配方法无法返回所有组的内容。)
In the example below, you can see an attempt of finding all tags in a string:
let str = '<p> <span>';
let tags = str.match(/<(.*?)>/g);
alert(tags); // <p>,<span>
The result is found in the array of matches but without the details about them. (结果在匹配项数组中找到,但没有关于它们的详细信息。)
But, usually, contents of the capturing groups in the result. (但是,通常情况下,结果中捕获组的内容。)
For getting them, it is necessary to use the str.matchAll(regexp) method, which was added to JavaScript long after the match method. One of the important differences of this method is that it returns an iterable object, rather than an array. Once the g flag is present, it returns each match as an array with groups. In case of finding no matches, it does not return null but an empty iterable object. (要获取它们,必须使用str.matchAll (正则表达式)方法,该方法在match方法之后很久才添加到JavaScript中。此方法的重要区别之一是它返回一个可迭代的对象,而不是一个数组。g标志出现后,它将每个匹配项作为包含组的数组返回。如果没有找到匹配项,它不会返回null ,而是返回一个空的可迭代对象。)
Here is an example:
let result = '<p> <span>'.matchAll(/<(.*?)>/gi);
// result - is't an array, but an iterable object
console.log(result); // [object RegExp String Iterator]
console.log(result[0]); // undefined (*)
result = Array.from(result); // let's turn it into array
alert(result[0]); // <p>,p (1st tag)
alert(result[1]); // <span>,span (2nd tag)
It is not easy to remember groups by their names. It is actual for simple patterns but counting parentheses is inconvenient for complex patterns. (要记住团体的名字并不容易。这对于简单的图案来说是实际的,但数括号对于复杂的图案来说是不方便的。)
You can do it by putting ?<name> right after the opening parent.
Here is an example of searching for a date:
let dateRegexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
let str = "2020-04-20";
let groups = str.match(dateRegexp).groups;
console.log(groups.year);
console.log(groups.month);
console.log(groups.day);
The groups are residing in the .groups property of the match. To search for the overall dates, the g flag can be added. (组驻留在匹配的.groups属性中。要搜索总体日期,可以添加g标记。)
The matchAll is also needed for obtaining full matches along with the groups, like this:
let dateRegexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/g;
let str = "2020-04-30 2020-10-01";
let results = str.matchAll(dateRegexp);
for (let result of results) {
let {
year,
(年)
month,
(月)
day
(n.日)
} = result.groups;
console.log(`${day}.${month}.${year}`);
}
Capturing Groups in the Replacement
Capturing Groups in the Replacement (在替换中捕获组)
The str.replace(regexp, replacement), used for replacing all the matches with regular expressions in str helps to use parentheses contents in the replacement string. It should be done with $n (n is the group number). (用于用str中的正则表达式替换所有匹配项的str.replace (regexp, replacement)有助于在替换字符串中使用括号内容。应该使用$ n ( n是组号)完成。)
For instance:
let str = "John Smith";
let regexp = /(\w+) (\w+)/;
console.log(str.replace(regexp, '$2, $1'));
The reference will be $<name> for the named parentheses.
Here is an example:
let regexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/g;
let str = "2020-03-30, 2020-10-01";
console.log(str.replace(regexp, '$<day>.$<month>.$<year>'));
Summary
Summary (概要)
A part of a pattern may be enclosed in parentheses. It is known as a capturing group. Parentheses groups are, generally, numbered from left to right. they can be named with (?<name>…).
The method is used for returning capturing groups without the g flag. The str.matchAll method constantly returns capturing groups. (该方法用于返回没有g标志的捕获组。str.matchAll方法不断返回捕获组。)
Also, parentheses contents can be used in the replacement strings in str.replace. (此外,括号内容可用于str.replace中的替换字符串。)