file_get_contents / curl weird encoding from site? How to fix?

whitedragond · May 31, 2013, 3:47pm

I’m trying to use CURL and/or file_get_contents

curl_setopt($ch, CURLOPT_URL, “http://freecampsites.net/attribute/states/ut/”);

to pull back the web page code. The problem is, the encoding (correct word?) comes back looking like:


‹������í}k{Û6ÒèçøW LeÙ»¦$êæ»z'iÒM×vê·§íã‘Äˆ"X^,«Ýþ÷wf�R¤$Û”âtóìi›D$.ƒÁ`€ÃÃ¯^¼?¹üéì%eÆc}xþöÍ	3ÌZíªyR«½¸|Áþçõå»·ÌªÖÙeÈýÈ]és¯V{yj0cÇÁ~6™Lª“fU†ƒÚåyíaYXY?šq®fÕ‰£»qH
ÞŽ=?:ZÆÚÛÛSµ©¬àBÙw=‘•Œƒ•½íû5ËÂr‘ºAÌâi�Åbqe×>òe®R!ƒ±e²ëÿ©Ÿÿ›ýüëdà[5H¢áæÏ•ëHÄÇ¶-?®l³Ê‡c³µSïX³YùukYérÌ]ÿ”Vè‡BØ|@—ETõE¼¬ÐÄñ¸qÅ„
@‰Í~âÛH§Í-öa¼+„aÐu¤Œ…WíPðX¼ô¾mVTï*[aP¬Š=‡²•¹¾W(“GSß†Ü8LÄ‡Ä(Ä¤Í
5Ú¯°£\Kž´9"SÒÇÒ–û–é‚µZy¶¯ÞÕØU¶Ø?Y¥:rà	“ÃXOc×Žª¶× ¥Qå ëR”ïÑ@Äº;Ñóé% gû¹þë‹ª¡À©tDÕõ#ÆÏE_†bsÀ·YDäýsk©xXS5Æ"æÌahGÆÝþÔ¼±fK?PGÆsïõwßÿp|~qüÃèäöüÿžÄ?adOÆ/Þ½>9Ýù±w<=ÿÞúW²ûáÈ`5` »lŠß÷æÈ8QÀÌK |4
�²ï³‡<9úpùÊÜÕ`b7öD—±c(îI³7‘Ç}‡]Ä0´ìŒ‡#f²‹)°H‰mö!æCöoö
øŠ¤ŒuXSP�œçú#
ïÈˆâ©'¢¡±Á†¡èg³e'k“ÀÔèÖâ!P?ªñ1ÿÝõaæ�Ú5Uµ£èÛFÝ²êmkeù©9eÇåÐ,0¥ðuç¨â|É.ÐÓ@†ÀönlöìÖƒ	0a–9ûIèm®¨;†YÕz3…ß¬~0#þDf@lºyq/¡„N»ÃƒÀs«×Â(ú'¬J…Ä=2Š4gçì•ÎƒÔíC¡$×r�Hc¯¬
„]

Here is the header info:

HTTP/1.1 200 OK
Date: Thu, 30 May 2013 19:18:06 GMT
Server: Apache/2.2.24 (FreeBSD) mod_fcgid/2.3.6
X-Powered-By: PHP/5.4.13
Vary: Accept-Encoding,Cookie
Cache-Control: max-age=3, must-revalidate
WP-Super-Cache: Served supercache file from PHP
Content-Encoding: gzip
Content-Length: 10314
Last-Modified: Sun, 26 May 2013 15:32:28 GMT
Cache-Control: no-store, no-cache, must-revalidate, max-age=0
Pragma: no-cache
Content-Type: text/html; charset=UTF-8

Can anyone help me with this? The same function works on a different site, so I’m assuming it is something with how their page is coded.

Thanks

whitedragond · May 31, 2013, 3:55pm

The entire .php code:

[php]

<?PHP $CurrentTimeDate = date("Y-m-d_G_i_s"); //Not sure if this is working or doing anything $headers = array( 'Content-Encoding: gzip', 'Content-Type: text/html', 'charset: UTF-8' ); $ch = curl_init(); // set URL and other appropriate options curl_setopt($ch, CURLOPT_HTTPHEADER, $headers); curl_setopt($ch, CURLOPT_URL, "http://freecampsites.net/attribute/states/ut/"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_HEADER, 1); // grab URL and pass it to the browser $out = curl_exec($ch); // close cURL resource, and free up system resources curl_close($ch); $CurlFileToWrite = 'cURL_file_get_contents_'.$CurrentTimeDate.'.txt'; $fp = fopen($CurlFileToWrite, 'w'); fwrite($fp, $out); fclose($fp); echo "Data Written "; ?> [/php]

sebrenauld · May 31, 2013, 4:17pm

The answer lies in this header:

Content-Encoding: gzip

The answer to your dilemma is this:

[php]curl_setopt($ch,CURLOPT_ENCODING , “gzip”);[/php]

Alternatively, you can write your own gz inflater… However, that’s not quite a solution when curl does it itself!

whitedragond · May 31, 2013, 4:27pm

Whoa, that actually worked! I’m shocked that was so easy, I wonder if I had something else messing it up, it was staring me in the face!!

Thanks for your help!!!