2

I am quite new to R and I am trying to learn webscraping. I basically need to extract documents from this website.

Ideally, the data needs to be structured in three columns: YEAR, DATE, and INTRODUCTORYSTATEMENT_CONTENT. Can anyone help with the coding?

Stephen Rauch
  • 1,783
  • 11
  • 21
  • 34
Rollo99
  • 121
  • 2

1 Answers1

1

This should be possible with rvest in R. Two things make is possible

  1. URL pattern is predictable, https://www.ecb.europa.eu/press/pressconf/2012/html/index.en.html (replace 2012 with other year values)
  2. Html page applies predictable CSS for INTRODUCTORYSTATEMENT_CONTENT (E.g.: doc-title and doc-subtitle )

enter image description here

Following articles have examples :

https://towardsdatascience.com/web-scraping-tutorial-in-r-5e71fd107f32 https://www.datacamp.com/community/tutorials/r-web-scraping-rvest https://www.analyticsvidhya.com/blog/2017/03/beginners-guide-on-web-scraping-in-r-using-rvest-with-hands-on-knowledge/

Shamit Verma
  • 2,239
  • 1
  • 8
  • 14
  • Can you please take a look at this question? https://stackoverflow.com/questions/66996370/r-error-in-f-x1l-y1l-scheme-not-supported-in-url-na thanks! – stats_noob Apr 08 '21 at 03:43