Forum Articles
  Welcome back Join CF
You are here You are here: Home | Forum | restore

You are currently viewing our boards as a guest which gives you limited access to view most of the discussions, articles and other free features. By joining our Virgin Media community you will have full access to all discussions, be able to view and post threads, communicate privately with other members (PM), respond to polls, upload your own images/photos, and access many other special features. Registration is fast, simple and absolutely free so please join our community today.


Welcome to Cable Forum
Go Back   Cable Forum > Computers & IT > General IT Discussion
Register FAQ Community Calendar

PHP RegEx Help
Reply
 
Thread Tools
Old 22-10-2010, 12:23   #1
AbyssUnderground
Inactive
 
Join Date: Oct 2005
Location: Merseyside
Age: 37
Services: BT Infinity Option 2, HH5, synced at maximum 80Mbps/20Mbps.
Posts: 2,221
AbyssUnderground has reached the bronze age
AbyssUnderground has reached the bronze ageAbyssUnderground has reached the bronze ageAbyssUnderground has reached the bronze ageAbyssUnderground has reached the bronze ageAbyssUnderground has reached the bronze ageAbyssUnderground has reached the bronze ageAbyssUnderground has reached the bronze ageAbyssUnderground has reached the bronze ageAbyssUnderground has reached the bronze age
Send a message via MSN to AbyssUnderground
PHP RegEx Help

Hi all,

I have a small issue I can't seem to solve with this code that counts the number of words on a HTML page:

Code:
$PageDataStripped = $PageData; // $PageData is the source code of any HTML Page

$PageDataStripped = preg_replace("/<(.*)>/iU"," ",$PageDataStripped); //Strip out anything between < and > tags (strip_tags() not used because it seems to remove some normal  text too)
    preg_match_all("/([a-zA-Z0-9]*) /iU", $PageDataStripped, $wordCount); // Match each word on the page
// Debugging
    echo "<pre>";
    print_r($wordCount);
    echo "</pre>";
    
//Cycle the array and make sure values don't = nothing and then increase the count variable
    $wordCountfor = $wordCount[1];
    $wordsOnPage = 0;
    foreach($wordCountfor as $word){
        if(!($word == "" || $word == " " || $word == "  ")){
            $wordsOnPage++;
            echo $word." ";
        }
    }
The line in red seems to remove all of the code and replace with nothing rather than only replacing the contents of the two angle brackets. Using a RegEx helper it works fine but PHP just doesn't parse it the same.

Am I missing something?

Thanks in advance.

Andy

---------- Post added at 11:23 ---------- Previous post was at 11:11 ----------

Looks like I may have solved it (typical that eh?):

Code:
 

$PageDataStripped = $PageData;
    $PageDataStripped = preg_replace("/<script(.*)<\/script>/iU"," ",$PageDataStripped);
    //echo $PageDataStripped;
    $PageDataStripped = strip_tags($PageDataStripped);
    //$PageDataStripped = preg_replace("/<(.*)>/iU"," ",$PageDataStripped);

    preg_match_all("/([a-zA-Z0-9:;,\.\'\"\?@£$%&\!]*) /iU", $PageDataStripped, $wordCount);
    //echo "<pre>";
    //print_r($wordCount);
    //echo "</pre>";
    
    $wordCountfor = $wordCount[1];
    $wordsOnPage = 0;
    foreach($wordCountfor as $word){
        if(!($word == "" || $word == " " || $word == "  ")){
            $wordsOnPage++;
            echo $word." ";
        }
    }
AbyssUnderground is offline   Reply With Quote
Advertisement
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:00.


Server: osmium.zmnt.uk
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.
All Posts and Content are © Cable Forum