Postby pendulum » Mon Sep 17, 2012 8:37 am

I'm a php n00b, but I've been trying to create a script that will parse a URL and echo the results. It works now (sort of) but I know there's a better way of doing this. Would anyone care to take a look and see if they can help me make this work better?

// Score Scraper


   <link rel="stylesheet" type="text/css" href="scraper.css">


         // Report all PHP errors (pull in prod. vers.)
         // error_reporting(E_ALL);

         //Data Source. Don't touch!
         $data = file_get_contents('');

         // Creating data validation stamp from nfl_s_stamp
         preg_match_all('/[0-9]{10,10}/', $data, $stamp);

         // printing time stamp
         echo "<p>Data Validation: ".$stamp[0][0]."</p>";

         // Lets pull 1 game from $data and split it up into teams and scores.

            // We'll close php so we can add some lame HTML shtuff here:

         <div id="scoreTicker">
            <h1>Latest Scores</h1>
         <!-- Now we'll re-open PHP and do the good stuff -->

            // here we'll pull each game frome $data. Let's hope this still works next week
            preg_match_all('/&\w*_left[0-9]*=\^?\w*%20\w*%20(%20|\w*\.?)%20\^?\w*(%20[A-Z][a-z]*|%20)?(%20[0-9]*%20|%20)?(%20[0-9]*%20)?(\(\w*\)|\(\d:\d*%20(AM|PM)?%20[A-Z]*\))&\w*_right[0-9]*_count\W[0-9]*&\w*\W\w*\W*\w*\W\w*\W\w*\W\w*\W\w*\W\w*\?\w*\W[0-9]{9,9}/', $data, $games);

            foreach ($games[0] as $game) {
               // pulling team one from $game
               preg_match_all('/&\w*_left[0-9]*\W\^?\w*(%20[A-Z][a-z]*)?/', $game, $teamOne);
               // pulling team two from $game
               preg_match_all('/([0-9][0-9]%20%20|at)%20\^?[A-Z]*[a-z]*%20?[A-Z]*[a-z]*/', $game, $teamTwo);
               // pulling score one from $game
               preg_match_all('/%20[0-9][0-9]%20%20%20/', $game, $scoreOne);
               // pulling score two from $game
               preg_match_all('/%20[0-9][0-9]%20\(/', $game, $scoreTwo);
               // pulling notes from $game
               preg_match_all('/\(\w*\)|\(\d\D\d*%20(PM|AM)%20\w*\)/', $game, $notes);
               // below we're going to print all the stuff we assigned above. Each array will have a preg_replace ran on it, to remove the ugly characters
               echo "<div class=\"game\">
                     <div class=\"notes\">
                     ".preg_replace('/%20/', ' ', $notes[0][0])."
                     </div><!-- end notes -->
                     <div class=\"home\">";
               echo "<table><tr><td class=\"team\">".preg_replace('/&\w*_left[0-9]*\W/', '', $teamOne[0][0])."</td><td class=\"score\">".preg_replace('/%20/', '', $scoreOne[0][0])."</td></tr></table>";
               echo "</div><div class=\"away\">
               <table><tr><td class=\"team\">".preg_replace('/at%20|\d*\W\d*\W\d*\^|^[0-9]*%/', '', $teamTwo[0][0])."</td><td class=\"score\">".preg_replace('/%20|\(/', '', $scoreTwo[0][0])."</td></tr></table></div></div>";


Postby johnj » Mon Sep 17, 2012 10:59 pm

You need to post a sample url. Only then can somebody suggest if there is a better way to match and extract.

