start page | rating of books | rating of authors | reviews | copyrights

Perl Cookbook

Perl CookbookSearch this book
Previous: 20.14. Program: htmlsub Chapter 20
Web Automation
 
 

20.15. Program: hrefsub

hrefsub makes substitutions in HTML files, so that the changes only apply to the text in HREF fields of < A HREF="..." > tags. For instance, if you had the scooby.html file from the previous example, and you've moved shergold.html to be cards.html, you need simply say:

% hrefsub shergold.html cards.html scooby.html 



<HTML><HEAD><TITLE>Hi!</TITLE></HEAD><BODY>



 



<H1>Welcome to Scooby World!</H1>



 



I have <A HREF="pictures.html">pictures</A> of the crazy dog



 



himself.  Here's one!<P>



 



<IMG SRC="scooby.jpg" ALT="Good doggy!"><P>



 



<BLINK>He's my hero!</BLINK>  I would like to meet him some day,



 



and get my picture taken with him.<P>



 



P.S. I am deathly ill.  <a href="cards.html">Please send



 



cards</A>.



 



</BODY></HTML>



The HTML::Filter manual page has a BUGS section that says:

Comments in declarations are removed from the declarations and then inserted as separate comments after the declaration. If you turn on strict_comment() , then comments with embedded "-\|-" are split into multiple comments.

This version of hrefsub will always lowercase the <a> and the attribute names within this tag when substitution occurs. If $foo is a multiword string, then the text given to MyFilter->text may be broken such that these words do not come together; i.e., the substitution does not work. There should probably be a new option to HTML::Parser to make it not return text until the whole segment has been seen. Also, some people may not be happy with having their 8-bit Latin-1 characters replaced by ugly entities, so htmlsub does that, too.

Example 20.12: hrefsub

#!/usr/bin/perl -w # hrefsub - make substitutions in <A HREF="..."> fields of HTML files # from Gisle Aas <[email protected]>  sub usage { die "Usage: $0 <from> <to> <file>...\n" }  my $from = shift or usage; my $to   = shift or usage; usage unless @ARGV;  # The HTML::Filter subclass to do the substitution.  package MyFilter; require HTML::Filter; @ISA=qw(HTML::Filter); use HTML::Entities qw(encode_entities);  sub start {    my($self, $tag, $attr, $attrseq, $orig) = @_;    if ($tag eq 'a' && exists $attr->{href}) {            if ($attr->{href} =~ s/\Q$from/$to/g) {                # must reconstruct the start tag based on $tag and $attr.                # wish we instead were told the extent of the 'href' value                # in $orig.                my $tmp = "<$tag";                for (@$attrseq) {                    my $encoded = encode_entities($attr->{$_});                    $tmp .= qq( $_="$encoded ");                }                $tmp .= ">";                $self->output($tmp);                return;            }    }    $self->output($orig); }  # Now use the class.  package main; foreach (@ARGV) {         MyFilter->new->parse_file($_); }


Previous: 20.14. Program: htmlsub Perl Cookbook  
20.14. Program: htmlsub Book Index