Highest Voted 'crawling' Questions - Data Science Stack Exchange

28

votes

7 answers

Publicly available social network datasets/APIs

As an extension to our great list of publicly available datasets, I'd like to know if there is any list of publicly available social network datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics…

open-source dataset crawling

asked Jun 17 '14 at 05:29

Rubens

4,097
5
23
42

11

votes

5 answers

LinkedIn web scraping

I recently discovered a new R package for connecting to the LinkedIn API. Unfortunately the LinkedIn API seems pretty limited to begin with; for example, you can only get basic data on companies, and this is detached from data on individuals. I'd…

data-mining social-network-analysis crawling scraping

asked May 13 '15 at 21:01

christopherlovell

480
1
5
18

8

votes

5 answers

How to scrape a website with a searchbar

How do I scrape a website that basically looks like google with just a giant searchbar in the middle of the screen. From it you can search after various companies and their stats. I have a list of 1000 companies I want to get information about. I…

data-mining scraping crawling

asked May 13 '16 at 09:43

Ceylon

141
1
1
4

4

votes

2 answers

Web Scraping - a scientific database

I am searching a scientific database for abstracts of papers containing the words project management. Here is the link: For getting abstracts, I need to click on any paper and open a new page. How can I do that for 68 papers? I program in R and…

r crawling scraping

asked Jun 29 '15 at 16:05

Hamideh

920
2
11
22

2

votes

4 answers

Format for storing textual data

For an upcoming project, I'm mining textual posts from an online forum, using Scrapy. What is the best way to store this text data? I'm thinking of simply exporting it into a JSON file, but is there a better format? Or does it not matter?

text-mining crawling

asked Jan 03 '15 at 22:38

cakesofwrath

21
1
2

2

votes

3 answers

Crawling customer reviews from Amazon

I want to know if there is any way that I can crawl customer reviews for particular products from amazon without being blocked. At the moment, my crawler is blocked after a few times. Any idea will be appreciated.

scraping crawling

asked May 25 '17 at 16:32

bensw

189
1
4

2

votes

0 answers

How can I find company descriptions for a long list of companies?

I'm going to train an ml algorithm to qualify potential sales leads based upon company descriptions. To do this, I need to find the company descriptions programatically. E.g. given a long list of company names, how can I find descriptions for these…

data scraping crawling

asked May 09 '16 at 07:50

Per Borgen

21
1

2

votes

0 answers

Is there a way to scrape tweets in realtime from a list of specified users?

I am trying to build a scraper that will run continuously and save the tweets from a list of users instantaneously or within seconds of the user tweeting it. It could save the tweet details to a continuously updated csv file.

data-mining scraping web-scraping crawling

asked Oct 18 '21 at 17:56

niusoski

21
2

1

vote

1 answer

Publicly available news APIs/datasets?

In addition to our list of publicly available datasets, I'd like to know if there is any list of publicly available news datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics of the data available…

dataset open-source crawling

asked Sep 05 '19 at 23:42

stevec

211
1
7

1

vote

2 answers

Data extraction using crawlers

I have a rather simple data scraping task, but my knowledge of web scraping is limited. I have a excel file containing the names of 500 cities in a column, and I'd like to find their distance from a fixed city, say Montreal. I have found this…

web-scraping information-extraction crawling

asked Sep 26 '21 at 02:13

Jay

13
3

0

votes

1 answer

corpus development for plagiarism detection

There are many simple plagiarism detection algorithms that work on search engines like google etc. I want to have a index of corpus of the whole internet to serve as a back-end database for my plagiarism detection software. What should be the…

python crawling

asked Jul 01 '19 at 02:23

Shiva

9
2

0

votes

0 answers

Scrapping Number of Customer Transactions Completed on a Marketplace

I'm looking to build a web scrapper/crawler that counts the number of completed transactions completed on a given website that sells a given product. I know purchasing is typically handled by a 3rd party and not the website itself. I don't want to…

scraping web-scraping crawling

asked Nov 29 '22 at 19:49

Okeith

1

0

votes

1 answer

Is there a ubiquitous web crawler that can generate a good language-specific dataset for training a transformer?

It seems like a lot of noteworthy AI tools are being trained on datasets generated by web crawlers rather than human-edited, human-compiled corpora (Facebook Translate, GPT-3). In general, it sounds more ideal to have an automatic and universal way…

nlp gpt crawling

asked Nov 18 '21 at 19:04

hmltn

131
3

-3

votes

4 answers

Looking for Web scraping tool for unstructured data

I want to scrape some data from a website. I have used import.io but still not much satisfied.. can any of you suggest about it.. whats the best tool to get the unstructured data from web

tools crawling

asked Aug 20 '14 at 14:12

cap

432
3
9

Questions tagged [crawling]