Data Fluency: Introduction to Web scraping


There is lots of data out there on the web useful for scholarly research, but much of it is presented in a form designed for humans to read piecemeal in a web browser rather than structured for easy analysis by computers. Web scraping is a process that automates extraction of data on the web to provide a structured format suitable for downstream analysis. From understanding political mobilisation or tracking earthquakes via Twitter, to extracting tabular data from online biological 'databases', or capturing the hottest dank memes to train your latest deep learning classifier, web scraping is a useful cross-disciplinary skill for almost anyone that works with data on a computer. This workshop will be taught in a similar style to Data Carpentry workshops. Data Carpentry’s mission is to train researchers in the core data skills for efficient, shareable, and reproducible research practices. 

Content covered includes:

  • Brief python refresher
  • How to run a script to get the data and export it into a .csv file, an outline of the webpage structure eg. HTML, XML and JSON
  • Visual exploration of the DOM (Document Object Model)xRunning URL and HTTP requests
  • HTML and API based scraping as well as wrangling, analysing and visualising the data
  • Legal and ethical considerations 

At the end of the course, participants will be:

  • Familiar with the basics of scraping data from the web
  • Able to write their own web scraping code to approach straight forward data collection tasks

Who: This foundational - intermediate level workshop is aimed at beginners to web scraping, although familiarity with the Python programming language is strongly recommended to get the most out of the workshop.

What you'll need: A computer with speakers and a microphone (note: webcams and dual monitors are recommended but not required). A web browser and Zoom are the only required software. A Zoom link and instructions will be sent to registrants 2 days prior to the workshop.

Note: This registration page is only open to  Monash external affiliate partners, MASSIVE partners and users, Monash ARDC project partners and users.

Monash staff and Graduate Research students, please register through myDevelopment.



Data Fluency: Introduction to Web scraping

More Information
Contact NameData Fluency for Research
I consent to the  collection and processing of my personal data.