![](/style/images/good.png)
![](/style/images/bad.png)
Web Scraping 🔍🔥
source link: https://www.producthunt.com/discussions/web-scraping?ref=hpfeed
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Scraping public data from the web, transforming it, and using it for a new product can become a very successful business.
What kind of web scraping projects have you worked on and which tools did you use?
I never finished it - but I started a Strava scraping project. I think there's a ton of suuuuper interesting data in there, although I did it for interests sake, rather than to monetise it.
And yep, like @berthakgokong says - Python, Beautiful Soup, etc.
I had a website that scraped automotive listings and looked at the year, model, mileage, options and price to determine if it was a good deal (this was before everyone was doing it)
I found the whole process of scraping messy and a bit shady (listing sites really wanted to protect their data) so I eventually abandoned it. Data ownership is a very messy subject which I decided to avoid completely.
Decided to build a CMS instead - no reliance on external data :) It is currently in private release and I think it offers quite a few competitive features that separate it from the competition.
@stefan_morris Yes it can be messy. Especially the data ownership. But it's not illegal in general. It really depends on the use-case.
With which tech stack are you building the CMS?
@david_gregorian I agree, it's not necessarily illegal but depending on the site, it can break their Terms of Use agreement, which is where it can get messy.
My CMS is a SaaS platform built with Vue/Nuxt and MongoDB. I'm still ramping up but there's a bit of information on my website (check out the docs) at https://shustudios.com
I'm currently looking for a few beta testers.
Funny thing, I scraped the "Top Most Upvoted Products" using Bardeen.ai (our tool). It worked really nicely.
BUT I wanted to figure out which month is the best to launch, and turns out they haven't updated that page, so now I gotta scrape the all products.
https://www.producthunt.com/e/50...
Let's see where this takes me.
Some Projects – LinkedIn, Szalesforce (AppExchange), GitHub, Amazon, Food Inspection Scores (Texas), Google, Government Data Sets, CraigsList, Library, lots of sites...
Tools (that I like) – Scrapestorm, Import.io, ParseHub, OctoParse, Scrapy, RPA Tools (UIPath, Automation Anywhere, etc), Selenium, CLI (wget, curl, shell scripts)...
Tools vary depending upon task - haven't found one tool that I can consistently use for everything ..
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK