Data Fluency: Introduction to Web scraping
There is lots of data out there on the web useful for scholarly research, but much of it is presented in a form designed for humans to read piecemeal in a web browser rather than structured for easy analysis by computers. Web scraping is a process that automates extraction of data on the web to provide a structured format suitable for downstream analysis. From understanding political mobilisation or tracking earthquakes via Twitter, to extracting tabular data from online biological 'databases', or capturing the hottest dank memes to train your latest deep learning classifier, web scraping is a useful cross-disciplinary skill for almost anyone that works with data on a computer. This workshop will be taught in a similar style to Data Carpentry workshops. Data Carpentry’s mission is to train researchers in the core data skills for efficient, shareable, and reproducible research practices.
Content covered includes:
- Brief python refresher
- How to run a script to get the data and export it into a .csv file, an outline of the webpage structure eg. HTML, XML and JSON
- Visual exploration of the DOM (Document Object Model)xRunning URL and HTTP requests
- HTML and API based scraping as well as wrangling, analysing and visualising the data
- Legal and ethical considerations
At the end of the course, participants will be:
- Familiar with the basics of scraping data from the web
- Able to write their own web scraping code to approach straight forward data collection tasks
Who: This foundational - intermediate level workshop is aimed at beginners to web scraping, although familiarity with the Python programming language is strongly recommended to get the most out of the workshop.
What you'll need: A computer with speakers and a microphone (note: webcams and dual monitors are recommended but not required). A web browser and Zoom are the only required software. A Zoom link and instructions will be sent to registrants 2 days prior to the workshop.
Note: This registration page is only open to Monash external affiliate partners, MASSIVE partners and users, Monash ARDC project partners and users.
Monash staff and Graduate Research students, please register through myDevelopment.
Date
TBA
Data Fluency: Introduction to Web scraping
Contact Name | Data Fluency for Research |
---|---|
Contact Email | datafluency@monash.edu |