Website data scraping?

I’m not sure if something like this is feasible and how complicated it is, but let’s see if anyone can help.

I’m interested in whether it’s possible to create a script that would forward a link from 1 web page (always the same) to product details, and it would take all the information about that product and save it in my database, and possibly translate it from English to the local language.
I know about HTML scraping, but their HTML seems a bit complicated for that, so is there another solution?

Most web sites that allow others to use their data provide an API (Application Programming Interface) that returns JSON encoded data.

1 Like

No matter how “complicated” the html might be, if the browser can parse it and display a page, a script can do that as well.
The only requirement would be, that the output structure is always somewhat the same for all products.

For machine-translation you can either use a professional (paid) API like google or yandex or you may wanna take a look at the LibreTranslate open source project.

However, all of those translators are mostly trained for longer text and may produce strange results with short product names or attributes.
Test quality before you go for a paid plan though.

I have done a lot of scraping of sites for various people and websites.
The problem is that every single website in the world always changes without your knowledge.
But, most websites keep their details, like price links or whatever, so you can sort out the page
in question and scrape it using their “class” system. But, one question would be if this scraper
would be just for your own use or public consumption? If for you own use, no problem, because
if you start getting odd results, just fix the code. If it is to post on a website that others have
access to, it could be an issue.
Often, you just VIEW-SOURCE of the page in question, search for one of the items you want
to scrape and look at the code and move backwards to find the class of the items. The, you
do a global search and find all the items and write scraping code to accomplish this. The code
will not be long or take a lot of time to run once you execute it. You could post the site and what
data you want from it and I or someone here could run you up a script to do it. Just an idea!

Sponsor our Newsletter | Privacy Policy | Terms of Service