PHP Curl - 403 / 1020 error when scraping web-domain

garethphp · October 21, 2021, 8:40pm

Hi,
I have suddenly starting receiving HTTP error code 403 and error code 1020 when scraping a web-site using curl, that I have been scraping for several years. I can load the website fine in my web-browser from the same device / IP as the web server.

Any suggestions would be greatly appreciated. I have included the code below referencing the relevant web-site:

function get_data($url) 
{
	$ch = curl_init();
	
	curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0");
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
	curl_setopt($ch, CURLOPT_HEADER, true);
	
	$data = curl_exec($ch);
	
	
	if (!curl_errno($ch)) 
	{
		switch ($http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE)) 
		{
			case 200:  # OK
			break;
			default:
				echo 'Unexpected HTTP code: ', $http_code, "\n";
		}
	}
	curl_close($ch);
	return $data;
}

$url = "https://www.oddschecker.com/";

$returned_content = get_data($url);
echo "<Br>".$returned_content;

skawid · October 22, 2021, 7:37am

They’re on to you; the website has put something in place forbidding requests that look like web scraping. You need to learn how to spoof those kinds of checks by making you requests look like they come from legitimate users.

garethphp · October 22, 2021, 8:12am

@skawid Thanks for the input - I thought this may be the case.

Does anyone have any knowledge in bypassing these kind of checks?

megalan · November 17, 2021, 6:45am

this seems to be an ssl fingerprinting protection from cloudflare. https://blog.cloudflare.com/monsters-in-the-middleboxes/ - I can confirm that curl fails to download the html of this website, while ScrapeNinja API Documentation (restyler) | RapidAPI (which emulates chrome fingerprint) works fine.

garethphp · November 18, 2021, 11:33pm

@megalan thanks for the input. I was pulling my hair out trying to get around the cloudflare block.

I just hooked up RapidAPI to my scraping and all is working now. Thanks again so much!

KillBill · February 14, 2022, 9:30am

Hello my friend, I am facing the same problem! How you solve it your problem with Cloudfare protection? Did you solve it with Php or youjust made a total new script in Python?