2022년 10월 11일

이 글은 다음 언어로만 작성되어 있습니다. English, Español, Français, Italiano, 日本語, Русский, Türkçe, Українська, 简体中文. 한국어 번역에 참여해주세요.

Quantifiers +, *, ? and {n}

Let’s say we have a string like +7(903)-123-45-67 and want to find all numbers in it. But unlike before, we are interested not in single digits, but full numbers: 7, 903, 123, 45, 67.

A number is a sequence of 1 or more digits \d. To mark how many we need, we can append a quantifier.

Quantity {n}

The simplest quantifier is a number in curly braces: {n}.

A quantifier is appended to a character (or a character class, or a [...] set etc) and specifies how many we need.

It has a few advanced forms, let’s see examples:

The exact count: {5}

\d{5} denotes exactly 5 digits, the same as \d\d\d\d\d.

The example below looks for a 5-digit number:

alert( "I'm 12345 years old".match(/\d{5}/) ); //  "12345"

We can add \b to exclude longer numbers: \b\d{5}\b.

The range: {3,5}, match 3-5 times

To find numbers from 3 to 5 digits we can put the limits into curly braces: \d{3,5}

alert( "I'm not 12, but 1234 years old".match(/\d{3,5}/) ); // "1234"

We can omit the upper limit.

Then a regexp \d{3,} looks for sequences of digits of length 3 or more:

alert( "I'm not 12, but 345678 years old".match(/\d{3,}/) ); // "345678"

Let’s return to the string +7(903)-123-45-67.

A number is a sequence of one or more digits in a row. So the regexp is \d{1,}:

let str = "+7(903)-123-45-67";

let numbers = str.match(/\d{1,}/g);

alert(numbers); // 7,903,123,45,67

Shorthands

There are shorthands for most used quantifiers:

+

Means “one or more”, the same as {1,}.

For instance, \d+ looks for numbers:

let str = "+7(903)-123-45-67";

alert( str.match(/\d+/g) ); // 7,903,123,45,67

?

Means “zero or one”, the same as {0,1}. In other words, it makes the symbol optional.

For instance, the pattern ou?r looks for o followed by zero or one u, and then r.

So, colou?r finds both color and colour:

let str = "Should I write color or colour?";

alert( str.match(/colou?r/g) ); // color, colour

*

Means “zero or more”, the same as {0,}. That is, the character may repeat any times or be absent.

For example, \d0* looks for a digit followed by any number of zeroes (may be many or none):

alert( "100 10 1".match(/\d0*/g) ); // 100, 10, 1

Compare it with + (one or more):

alert( "100 10 1".match(/\d0+/g) ); // 100, 10
// 1 not matched, as 0+ requires at least one zero

More examples

Quantifiers are used very often. They serve as the main “building block” of complex regular expressions, so let’s see more examples.

Regexp for decimal fractions (a number with a floating point): \d+\.\d+

In action:

alert( "0 1 12.345 7890".match(/\d+\.\d+/g) ); // 12.345

Regexp for an “opening HTML-tag without attributes”, such as <span> or <p>.

The simplest one: /<[a-z]+>/i
```
alert( "<body> ... </body>".match(/<[a-z]+>/gi) ); // <body>
```
The regexp looks for character '<' followed by one or more Latin letters, and then '>'.
Improved: /<[a-z][a-z0-9]*>/i

According to the standard, HTML tag name may have a digit at any position except the first one, like <h1>.
```
alert( "<h1>Hi!</h1>".match(/<[a-z][a-z0-9]*>/gi) ); // <h1>
```

Regexp “opening or closing HTML-tag without attributes”: /<\/?[a-z][a-z0-9]*>/i

We added an optional slash /? near the beginning of the pattern. Had to escape it with a backslash, otherwise JavaScript would think it is the pattern end.

alert( "<h1>Hi!</h1>".match(/<\/?[a-z][a-z0-9]*>/gi) ); // <h1>, </h1>

We can see one common rule in these examples: the more precise is the regular expression – the longer and more complex it is.

For instance, for HTML tags we could use a simpler regexp: <\w+>. But as HTML has stricter restrictions for a tag name, <[a-z][a-z0-9]*> is more reliable.

Can we use <\w+> or we need <[a-z][a-z0-9]*>?

In real life both variants are acceptable. Depends on how tolerant we can be to “extra” matches and whether it’s difficult or not to remove them from the result by other means.

과제

생략 부호 '...'를 어떻게 찾을 수 있을까요?

연속된 점 3개 혹은 그 이상으로 구성되는 생략 부호를 찾기 위한 정규표현식을 만들어보세요.

정답 작성 시, 다음 코드가 정상 동작하게 됩니다.

          let regexp = /your regexp/g;
alert( "Hello!... How goes?.....".match(regexp) ); // ..., .....
        

정답:

let regexp = /\.{3,}/g;
alert( "Hello!... How goes?.....".match(regexp) ); // ..., .....

점은 특수문자이므로 \.을 삽입해 이스케이프 해야 합니다.

HTML에서 쓰이는 색상 검출을 위한 정규표현식

#ABCDEF과 같이 HTML에서 사용하는 색상을 검출할 수 있는 정규표션식을 만들어보세요. 해당 색상은 첫 글자 #과 6개의 16진수로 구성됩니다.

예시:

          let regexp = /...your regexp.../

let str = "color:#121212; background-color:#AA00ef bad-colors:f#fddee #fd2 #12345678";

alert( str.match(regexp) )  // #121212,#AA00ef

참고: 이 과제에선 #123 또는 rgb(1,2,3)같은 다른 색상 포맷은 고려하지 않아도 됩니다.

6개의 16진수가 뒤에 오는 #을 찾아야 합니다.

16진수 문자는 [0-9a-fA-F]와 같이 표현할 수 있습니다. i 플래그를 사용하는 경우엔 [0-9a-f]로도 표현 가능합니다.

이후 수량자(quantifier) {6}를 사용해 6개의 16진수를 찾습니다.

그 결과 /#[a-f0-9]{6}/gi라는 정규표현식이 도출됩니다.

let regexp = /#[a-f0-9]{6}/gi;

let str = "color:#121212; background-color:#AA00ef bad-colors:f#fddee #fd2"

alert( str.match(regexp) );  // #121212,#AA00ef

그런데 이 정규표현식은 6자리보다 긴 16진수가 뒤에 오는 경우도 검출한다는 문제가 있습니다.

alert( "#12345678".match( /#[a-f0-9]{6}/gi ) ) // #123456

이를 해결하려면 정규표현식 끝부분에 \b를 추가하면 됩니다.

// color
alert( "#123456".match( /#[a-f0-9]{6}\b/gi ) ); // #123456

// not a color
alert( "#12345678".match( /#[a-f0-9]{6}\b/gi ) ); // null

튜토리얼 지도

댓글을 달기 전에 마우스를 올렸을 때 나타나는 글을 먼저 읽어주세요.