start page | rating of books | rating of authors | reviews | copyrights

Perl Cookbook

Perl CookbookSearch this book
Previous: 6.4.  Commenting Regular Expressions Chapter 6
Pattern Matching
Next: 6.6. Matching Multiple Lines
 

6.5. Finding the N th Occurrence of a Match

Problem

You want to find the N th match in a string, not just the first one. For example, you'd like to find the word preceding the third occurrence of "fish" :





One fish two fish red fish blue fish



Solution

Use the /g modifier in a while loop, keeping count of matches:

$WANT = 3; $count = 0; while (/(\w+)\s+fish\b/gi) {     if (++$count == $WANT) {         print "The third fish is a $1 one.\n";         # Warning: don't `last' out of this loop     } } 



The third fish is a red one.



Or use a repetition count and repeated pattern like this:

/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;

Discussion

As explained in the chapter introduction, using the /g modifier in scalar context creates something of a progressive match , useful in while loops. This is commonly used to count the number of times a pattern matches in a string:

# simple way with while loop $count = 0; while ($string =~ /PAT/g) {     $count++;               # or whatever you'd like to do here }  # same thing with trailing while $count = 0; $count++ while $string =~ /PAT/g;  # or with for loop for ($count = 0; $string =~ /PAT/g; $count++) { }      # Similar, but this time count overlapping matches $count++ while $string =~ /(?=PAT)/g;

To find the N th match, it's easiest to keep your own counter. When you reach the appropriate N, do whatever you care to. A similar technique could be used to find every N th match by checking for multiples of N using the modulus operator. For example, (++$count % 3) == 0 would be every third match.

If this is too much bother, you can always extract all matches and then hunt for the ones you'd like.

$pond  = 'One fish two fish red fish blue fish';  # using a temporary @colors = ($pond =~ /(\w+)\s+fish\b/gi);      # get all matches $color  = $colors[2];                         # then the one we want  # or without a temporary array $color = ( $pond =~ /(\w+)\s+fish\b/gi )[2];  # just grab element 3  print "The third fish in the pond is $color.\n"; 



The third fish in the pond is red.



Or finding all even-numbered fish:

$count = 0; $_ = 'One fish two fish red fish blue fish'; @evens = grep { $count++ % 2 == 1 } /(\w+)\s+fish\b/gi; print "Even numbered fish are @evens.\n"; 



Even numbered fish are two blue.



For substitution, the replacement value should be a code expression that returns the proper string. Make sure to return the original as a replacement string for the cases you aren't interested in changing. Here we fish out the fourth specimen and turn it into a snack:

$count = 0; s{    \b               # makes next \w more efficient    ( \w+ )          # this is what we'll be changing    (      \s+ fish \b    ) }{     if (++$count == 4) {         "sushi" . $2;     } else {          $1   . $2;     } }gex; 



One fish two fish red fish sushi fish



Picking out the last match instead of the first one is a fairly common task. The easiest way is to skip the beginning part greedily. After /.*\b(\w+)\s+fish\b/ , for example, the $1 variable would have the last fish.

Another way to get arbitrary counts is to make a global match in list context to produce all hits, then extract the desired element of that list:

$pond = 'One fish two fish red fish blue fish swim here.'; $color = ( $pond =~ /\b(\w+)\s+fish\b/gi )[-1]; print "Last fish is $color.\n"; 



Last fish is blue.



If you need to express this same notion of finding the last match in a single pattern without /g , you can do so with the negative lookahead assertion (?!THING) . When you want the last match of arbitrary pattern A, you find A followed by any amount of not A through the end of the string. The general construct is A(?!.*A)*$ , which can be broken up for legibility:

m{     A               # find some pattern A     (?!             # mustn't be able to find         .*          # something         A           # and A     )     $               # through the end of the string }x

That leaves us with this approach for selecting the last fish:

$pond = 'One fish two fish red fish blue fish swim here.'; if ($pond =~ m{                     \b  (  \w+) \s+ fish \b                 (?! .* \b fish \b )             }six ) {     print "Last fish is $1.\n"; } else {     print "Failed!\n"; } 



Last fish is blue.



This approach has the advantage that it can fit in just one pattern, which makes it suitable for similar situations as shown in Recipe 6.17 . It has its disadvantages, though. It's obviously much harder to read and understand, although once you learn the formula, it's not too bad. But it also runs more slowly though  - around twice as slowly on the data set tested above.

See Also

The behavior of m//g in scalar context is given in the "Regexp Quote-like Operators" section of perlop (1), and in the "Pattern Matching Operators" section of Chapter 2 of Programming Perl ; zero-width positive lookahead assertions are shown in the "Regular Expressions" section of perlre (1), and in the "rules of regular expression matching" section of Chapter 2 of Programming Perl


Previous: 6.4.  Commenting Regular Expressions Perl Cookbook Next: 6.6. Matching Multiple Lines
6.4. Commenting Regular Expressions Book Index 6.6. Matching Multiple Lines