PHP Curl - 403 / 1020 error when scraping web-domain

I have suddenly starting receiving HTTP error code 403 and error code 1020 when scraping a web-site using curl, that I have been scraping for several years. I can load the website fine in my web-browser from the same device / IP as the web server.

Any suggestions would be greatly appreciated. I have included the code below referencing the relevant web-site:

function get_data($url) 
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0");
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
	curl_setopt($ch, CURLOPT_HEADER, true);
	$data = curl_exec($ch);
	if (!curl_errno($ch)) 
		switch ($http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE)) 
			case 200:  # OK
				echo 'Unexpected HTTP code: ', $http_code, "\n";
	return $data;

$url = "";

$returned_content = get_data($url);
echo "<Br>".$returned_content;

They’re on to you; the website has put something in place forbidding requests that look like web scraping. You need to learn how to spoof those kinds of checks by making you requests look like they come from legitimate users.

@skawid Thanks for the input - I thought this may be the case.

Does anyone have any knowledge in bypassing these kind of checks?

this seems to be an ssl fingerprinting protection from cloudflare. - I can confirm that curl fails to download the html of this website, while ScrapeNinja API Documentation (restyler) | RapidAPI (which emulates chrome fingerprint) works fine.

@megalan thanks for the input. I was pulling my hair out trying to get around the cloudflare block.

I just hooked up RapidAPI to my scraping and all is working now. Thanks again so much!

1 Like
Sponsor our Newsletter | Privacy Policy | Terms of Service