I wanted to write a perl script to show all the HREF
references in a document, mainly to assist me with some
web design.
I have only tested it on my *.htm files, so it is far
from rigourously tested!
Also, I solved the problem of ignoring the case of
the search string...just add the letter i to the command.
#!/usr/local/bin/perl
foreach $file (<*.htm*>) {
print "$file ";
open(HTFILE, $file);
$notitle = 1;
while (<HTFILE>) {
chop;
if (/^\<TITLE\>/i) {
s/<TITLE>//;
s/<\/TITLE>//;
print "($_)\n";
$notitle = 0;
}
if (/HREF=.*>/i) {
$line = $_;
$line =~ s/href/HREF/;
$where = index($line, "HREF");
$part1 = substr($line, ($where+4), 99);
$rapos = index($part1,">");
$part2 = substr($part1, 1, ($rapos-1));
print " --> $part2\n";
}
}
if ($notitle == 1) { print "\n" };
close(HTFILE);
}
Sample output (heavily edited)
creative.htm (<title>Creativity Home Page</title>)
--> "charter.htm"
--> "whatsnew.htm"
--> "cgroup1.htm"
--> "cgroup2.htm"
--> "cgroup3.htm"
--> "suncrfaq.htm"
--> "crquote.htm"
--> "craffirm.htm"
--> "famous.htm"
--> "crbrain.htm"
--> "cgroup8.htm"
--> "crcredit.htm"
--> "mailto:[email protected]"
crebooks.htm (Books related to Creativity)
--> "mailto:[email protected]"
--> "AAdams.htm"
--> B13159.htm
crsware.htm (Creativity Software)
--> #brainstorm
--> #ckm
--> #cm1
--> #cork
--> #gcope
--> #gsys
--> #ide
--> #ideplus
--> #ifisher
sylvan.htm (CREATIVITY</HEAD>)
--> "http://www.shore.net/~quantum/"
--> "#8"
--> "#15"
--> "#18"
--> "#7"
--> "#10"
--> "#9"
--> "#12"
--> "#14"
--> "#16"
--> "#20"
--> "#10"
--> "#6"
--> "#7"
--> "#2"
--> "#5"
--> "#15"
--> "#6"
--> "#3"
--> "#16"
--> "#17"
--> "mailto:[email protected]"
whatsnew.htm (What's new on the Creativity Web Pages)
--> "crbrain.htm"
--> "crebooks.htm"
--> "creative.htm"
---------------------------------------------------------------------------
Charles Cave ~ .-_|\ Phone +61 2 416 6877
Customer Services Manager / \ ~ Fax +61 2 416 2086
Unidata Australasia \.--._*<--- Level 2, 280 Pacific Hwy
[email protected] ~ v Lindfield, NSW, 2070 AUSTRALIA
---------------------------------------------------------------------------