Board index   FAQ   Search  
Register  Login
Board index PHP PHP Scripts

Who knows data scraping?

Links for php scripts

Moderators: macek, egami, gesf

Who knows data scraping?

Postby robertoshayerlyra » Fri Jun 27, 2014 6:12 am

I need to find people who know about scraping data. This is the right place?

I'm trying to scrape data from our local government. What I want is address from adoption offices. Here all adoptions go through the government.
So I have the URL of one office, there are 2 or 3 thousands more. But If I can manage to get one, the others will be easy.
I made many attempts, bellow I show two.
I know the problem could be related to a Javascript (Ajax maybe) that refresh the page.

As you all can see, I am not the most smart PHP developer. I am a bit old, I started with MUMPS, what is a language that nobody uses anymore (Thanks God). That means I am too old to new tricks.

Thanks to all those who had the patience to read this far.


Code: Select all
// First attempt

echo '<html><head></head><body>';
echo '<h1>Scraper PHP GET 1</h1>';

echo ini_get("allow_url_fopen");
echo ini_get("allow_url_fopen");

// I used this url for test
//$url = 'http://www.portaldaadocao.com.br';

//This is the URL that I really want
$url = 'http://www.cnj.jus.br/cna/Controle/ConsultaPublicaBuscaControle.php?transacao=CONSULTA&vara=2673';

$html = file_get_contents($url);
var_dump($html);

echo '</body></html>';

// Output
// 11
// Warning: file_get_contents(http://www.cnj.jus.br/cna/Controle/ConsultaPublicaBuscaControle.php?transacao=CONSULTA&vara=2673) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /home/rsl/www/sc01_get.php on line 14
// bool(false)




Code: Select all
// Second attempt

echo '<html><head></head><body>';
echo '<h1>Scraper PHP CURL 3</h1>';

// I used this url for test
//$url = 'http://www.portaldaadocao.com.br';

//This is the URL that I really want
$url = 'http://www.cnj.jus.br/cna/Controle/ConsultaPublicaBuscaControle.php?transacao=CONSULTA&vara=2673';

$curl = curl_init($url);
@curl_setopt($curl, CURLOPT_POSTFIELDS, "foo");
@curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
@curl_setopt($curl, CURLOPT_CUSTOMREQUEST, "POST");;

$html=@curl_exec($curl);
 
if (!$html) {
    echo "<br />cURL error number:" .curl_errno($curl);
    echo "<br />cURL error:" . curl_error($curl);
    exit;
}
else{
   echo '<br>begin HTML[';
    echo  $html;
   echo '<br>]end html ';
}
echo '</body></html>';

// Output
// 1
robertoshayerlyra
New php-forum User
New php-forum User
 
Posts: 2
Joined: Fri Jun 27, 2014 5:43 am

Re: Who knows data scraping?

Postby kladrian » Thu Aug 14, 2014 1:32 am

Hi
you should try this http://simplehtmldom.sourceforge.net/
It's a very easy-to-use yet powerful dom inspector.

You can load your remote html page and try to extract informations as you'd do with javascript.

:)

---
kladrian
kladrian
New php-forum User
New php-forum User
 
Posts: 12
Joined: Thu Aug 14, 2014 1:13 am


Return to PHP Scripts

Who is online

Users browsing this forum: No registered users and 1 guest

Sponsored by Sitebuilder Web hosting and Traduzioni Italiano Rumeno and antispam for cPanel.