2일 4월 2020
이 글은 다음 언어로만 작성되어 있습니다. English, 日本語, Русский, 简体中文. 한국어 번역에 참여해주세요.

Sometimes we need to find only those matches for a pattern that are followed or preceeded by another pattern.

There’s a special syntax for that, called “lookahead” and “lookbehind”, together referred to as “lookaround”.

For the start, let’s find the price from the string like `1 turkey costs 30€`. That is: a number, followed by `€` sign.

The syntax is: `X(?=Y)`, it means "look for `X`, but match only if followed by `Y`". There may be any pattern instead of `X` and `Y`.

For an integer number followed by `€`, the regexp will be `\d+(?=€)`:

``````let str = "1 turkey costs 30€";

alert( str.match(/\d+(?=€)/) ); // 30, the number 1 is ignored, as it's not followed by €``````

Please note: the lookahead is merely a test, the contents of the parentheses `(?=...)` is not included in the result `30`.

When we look for `X(?=Y)`, the regular expression engine finds `X` and then checks if there’s `Y` immediately after it. If it’s not so, then the potential match is skipped, and the search continues.

More complex tests are possible, e.g. `X(?=Y)(?=Z)` means:

1. Find `X`.
2. Check if `Y` is immediately after `X` (skip if isn’t).
3. Check if `Z` is also immediately after `X` (skip if isn’t).
4. If both tests passed, then the `X` is a match, otherwise continue searching.

In other words, such pattern means that we’re looking for `X` followed by `Y` and `Z` at the same time.

That’s only possible if patterns `Y` and `Z` aren’t mutually exclusive.

For example, `\d+(?=\s)(?=.*30)` looks for `\d+` only if it’s followed by a space, and there’s `30` somewhere after it:

``````let str = "1 turkey costs 30€";

In our string that exactly matches the number `1`.

Let’s say that we want a quantity instead, not a price from the same string. That’s a number `\d+`, NOT followed by `€`.

For that, a negative lookahead can be applied.

The syntax is: `X(?!Y)`, it means "search `X`, but only if not followed by `Y`".

``````let str = "2 turkeys cost 60€";

alert( str.match(/\d+(?!€)/) ); // 2 (the price is skipped)``````

## Lookbehind

Lookbehind is similar, but it looks behind. That is, it allows to match a pattern only if there’s something before it.

The syntax is:

• Positive lookbehind: `(?<=Y)X`, matches `X`, but only if there’s `Y` before it.
• Negative lookbehind: `(?<!Y)X`, matches `X`, but only if there’s no `Y` before it.

For example, let’s change the price to US dollars. The dollar sign is usually before the number, so to look for `\$30` we’ll use `(?<=\\$)\d+` – an amount preceded by `\$`:

``````let str = "1 turkey costs \$30";

// the dollar sign is escaped \\$
alert( str.match(/(?<=\\$)\d+/) ); // 30 (skipped the sole number)``````

And, if we need the quantity – a number, not preceded by `\$`, then we can use a negative lookbehind `(?<!\\$)\d+`:

``````let str = "2 turkeys cost \$60";

alert( str.match(/(?<!\\$)\d+/) ); // 2 (skipped the price)``````

## Capturing groups

Generally, the contents inside lookaround parentheses does not become a part of the result.

E.g. in the pattern `\d+(?=€)`, the `€` sign doesn’t get captured as a part of the match. That’s natural: we look for a number `\d+`, while `(?=€)` is just a test that it should be followed by `€`.

But in some situations we might want to capture the lookaround expression as well, or a part of it. That’s possible. Just wrap that part into additional parentheses.

In the example below the currency sign `(€|kr)` is captured, along with the amount:

``````let str = "1 turkey costs 30€";
let regexp = /\d+(?=(€|kr))/; // extra parentheses around €|kr

alert( str.match(regexp) ); // 30, €``````

And here’s the same for lookbehind:

``````let str = "1 turkey costs \$30";
let regexp = /(?<=(\\$|£))\d+/;

alert( str.match(regexp) ); // 30, \$``````

## Summary

Lookahead and lookbehind (commonly referred to as “lookaround”) are useful when we’d like to match something depending on the context before/after it.

For simple regexps we can do the similar thing manually. That is: match everything, in any context, and then filter by context in the loop.

Remember, `str.match` (without flag `g`) and `str.matchAll` (always) return matches as arrays with `index` property, so we know where exactly in the text it is, and can check the context.

But generally lookaround is more convenient.

Lookaround types:

Pattern type matches
`X(?=Y)` Positive lookahead `X` if followed by `Y`
`X(?!Y)` Negative lookahead `X` if not followed by `Y`
`(?<=Y)X` Positive lookbehind `X` if after `Y`
`(?<!Y)X` Negative lookbehind `X` if not after `Y`

## 과제

### Find non-negative integers

There’s a string of integer numbers.

Create a regexp that looks for only non-negative ones (zero is allowed).

An example of use:

``````let regexp = /your regexp/g;

let str = "0 12 -5 123 -18";

alert( str.match(regexp) ); // 0, 12, 123``````

The regexp for an integer number is `\d+`.

We can exclude negatives by prepending it with the negative lookahead: `(?<!-)\d+`.

Although, if we try it now, we may notice one more “extra” result:

``````let regexp = /(?<!-)\d+/g;

let str = "0 12 -5 123 -18";

console.log( str.match(regexp) ); // 0, 12, 123, 8``````

As you can see, it matches `8`, from `-18`. To exclude it, we need to ensure that the regexp starts matching a number not from the middle of another (non-matching) number.

We can do it by specifying another negative lookbehind: `(?<!-)(?<!\d)\d+`. Now `(?<!\d)` ensures that a match does not start after another digit, just what we need.

We can also join them into a single lookbehind here:

``````let regexp = /(?<![-\d])\d+/g;

let str = "0 12 -5 123 -18";

alert( str.match(regexp) ); // 0, 12, 123``````

We have a string with an HTML Document.

Write a regular expression that inserts `<h1>Hello</h1>` immediately after `<body>` tag. The tag may have attributes.

For instance:

``````let regexp = /your regular expression/;

let str = `
<html>
<body style="height: 200px">
...
</body>
</html>
`;

str = str.replace(regexp, `<h1>Hello</h1>`);``````

After that the value of `str` should be:

``````<html>
<body style="height: 200px"><h1>Hello</h1>
...
</body>
</html>``````

In order to insert after the `<body>` tag, we must first find it. We can use the regular expression pattern `<body.*>` for that.

In this task we don’t need to modify the `<body>` tag. We only need to add the text after it.

Here’s how we can do it:

``````let str = '...<body style="...">...';
str = str.replace(/<body.*>/, '\$&<h1>Hello</h1>');

In the replacement string `\$&` means the match itself, that is, the part of the source text that corresponds to `<body.*>`. It gets replaced by itself plus `<h1>Hello</h1>`.

An alternative is to use lookbehind:

``````let str = '...<body style="...">...';
str = str.replace(/(?<=<body.*>)/, `<h1>Hello</h1>`);

As you can see, there’s only lookbehind part in this regexp.

It works like this:

• At every position in the text.
• Check if it’s preceeded by `<body.*>`.
• If it’s so then we have the match.

The tag `<body.*>` won’t be returned. The result of this regexp is literally an empty string, but it matches only at positions preceeded by `<body.*>`.

So we replaces the “empty line”, preceeded by `<body.*>`, with `<h1>Hello</h1>`. That’s the insertion after `<body>`.

P.S. Regexp flags, such as `s` and `i` can also useful: `/<body.*>/si`. The `s` flag makes the dot `.` match a newline character, and `i` flag makes `<body>` also match `<BODY>` case-insensitively.

튜토리얼 지도

## 댓글

댓글을 달기 전에 마우스를 올렸을 때 나타나는 글을 먼저 읽어주세요.
• 추가 코멘트, 질문 및 답변을 자유롭게 남겨주세요. 개선해야 할 것이 있다면 댓글 대신 이슈를 만들어주세요.
• 잘 이해되지 않는 부분은 구체적으로 언급해주세요.
• 댓글에 한 줄짜리 코드를 삽입하고 싶다면 `<code>` 태그를, 여러 줄로 구성된 코드를 삽입하고 싶다면 `<pre>` 태그를 이용하세요. 10줄 이상의 코드는 plnkr, JSBin, codepen 등의 샌드박스를 사용하세요.