Character Classes
On this page
Character Classes (字符类)
Character classes are used for distinguishing characters like distinguishing between digits and letters. (字符类用于区分字符,例如区分数字和字母。)
Let’s start from a practical case. Imagine you have a phone number like +3(522) -865-42-76, and wish to turn it into pure numbers (35228654276). To meet that goal, it is necessary to find and remove everything that’s not a number.Character classes are there to help you with that. (让我们从一个实际案例开始。假设您有一个电话号码,如+3 (522) -865-42-76 ,并希望将其转换为纯数字(35228654276)。为了实现这一目标,有必要查找并删除非数字的所有内容。字符类可以帮助您实现这一目标。)
So, a character class can be described as a specific notation that corresponds to any symbol from a certain set. (因此,字符类可以被描述为对应于某个集合中的任何符号的特定符号。)
We will start from the “digit” class. It should be written as \d and matches any single digit. (我们将从“数字”类开始。它应写为\ d ,并与任何单位数字匹配。)
In the example below, let’s find the first digit:
let str = "+3(522)865-42-76";
let regexp = /\d/;
console.log(str.match(regexp)); // 3
With no flag g, the regular expression searches for the first match, which is the first \d. (在没有标志g的情况下,正则表达式搜索第一个匹配项,即第一个\ d。)
Adding the g flag will enable finding all the digits, like this:
let str = "+3(522)865-42-76";let regexp = /\d/g;
console.log(str.match(regexp)); // array of matches: 3,5,2,2,8,6,5,4,2,7,6
// the digits-only phone number of them:
console.log(str.match(regexp).join('')); // 35228654276
So, it is a character class for digits. But there exist other character classes, too. (因此,它是数字的字符类。但也有其他字符类。)
The most used character classes are as follows:
\d ( comes from digit): a digit (a character from 0 to 9).
\s ( comes from space): a space symbol. It contains \t (tabs),\n (newlines), and other characters (\v, \f,\r ).
\w (comes from word): it is either a letter of the Latin alphabet, a digit, or an underscore (_). Non-latin letters don’t belong to this class.
A regular expression can include regular symbols, as well as character classes. (正则表达式可以包括正则符号以及字符类。)
Let’s see an example where CSS\d corresponds to a string CSS with a digit following it:
let str = "It is CSS3?";
let regexp = /CSS\d/
console.log(str.match(regexp)); // CSS3
Multiple character classes can be used, like this:
console.log("It is HTML5!".match(/\s\w\w\w\w\d/)); // ' HTML5'
There is an “inverse class” for every character class, denoted with the same but uppercase letter. (每个字符类都有一个“逆类” ,用相同但大写的字母表示。)
“Inverse” means that it corresponds to all other characters:
\D - non-digit. It accepts any character, except \d (for instance, a letter). (-\ D -非数字。它接受除\ d (例如,字母)之外的任何字符。)
\S - non-space. Accepts any character, except \s (for instance, a letter). (-\ S -非空格。接受任何字符,\ s除外(例如,字母)。)
\W- non-wordly character. Accepts anything , except \w ( non-latin letter or a space). (-\ W-非单词字符。接受除\ w (非拉丁字母或空格)以外的任何内容。)
A Dot
A Dot (一个点)
A dot (.) is considered a special character class corresponding to “any character except a newline”. (点(. )被视为对应于“除换行符以外的任何字符”的特殊字符类。)
The example will look like this:
console.log("W".match(/./)); // W
In the example below, the dot is in the middle of a regexp:
let regexp = /HTM.5/;
console.log("HTML5".match(regexp)); // HTML5
console.log("HTM-5".match(regexp)); // HTM-5
console.log("HTM 5".match(regexp)); // HTM 5(space is a character, too)
So, the dot is considered “any character”, but not the “absense of a character”. (因此,点被认为是“任何字符” ,而不是“字符的缺失”。)
There should be a character for matching it, like here:
console.log("HTM5".match(/HTM.5/)); // null, no match, as there's no character for the dot
A dot doesn’t correspond to the newline character \n by default. (默认情况下,点与换行符\ n不对应。)
For example, the regexp A.B corresponds to A, and then B with any character between them, except for an \n newline, like this:
console.log("W\nD".match(/W.D/)); // null (no match)
There are circumstances when one wants a dot to mean “any character”, including newline. (在某些情况下,人们希望点表示“任何字符” ,包括换行符。)
The flag s is used for that. In case a regexp has it, then a dot corresponds literally to any character, like this:
console.log("W\nD".match(/W.D/s)); //W\nD (match!)
It is important to pay special attention to spaces. For example, the strings 1-5 and 1 - 5 are similar to each other. But, in case a regexp doesn’t take spaces into account, it might not work. (请务必特别注意房源空间。例如,字符串1-5和1-5彼此相似。但是,如果正则表达式不考虑空格,则可能不起作用。)
For finding the digits, separated by a hyphen, you can act like this:
console.log("1 - 5".match(/\d-\d/)); // null, no match!
Now, let’s fix it by adding spaces in the regular expression \d - \d, like here:
console.log("1 - 5".match(/\d - \d/)); // 1 - 5, now it works
// or we can use \s class:
console.log("1 - 5".match(/\d\s-\s\d/)); // 1 - 5, also works
A space is considered a character. In importance, it is equal to any other character. You can add or remove spaces from a regexp, expecting to work the same way. That is, in a regexp all the characters matter. (空格被视为一个字符。在重要性上,它等同于任何其他字符。您可以在正则表达式中添加或删除空格,期望以相同的方式工作。也就是说,在正则表达式中,所有字符都很重要。)