6 hours of instruction
This course intermediate concepts in natural language processing, equipping learners with the ability to clean and process large amounts of text data, segregating text into different groups and topics, as well as finding similarities between different documents. As natural language can be vague and subjective, the course also presents ways to evaluate and interpret these language models.
OBJECTIVES
- Understand and implement bag of words and term frequency inverse document frequency (TF-IDF)
- Process, clean, and format text data for analysis
- Extract key summary metrics and words from a corpus of documents
- Perform latent Dirichlet allocation (LDA) for topic modelling
PREREQUISITES
Introduction to NLP
SYLLABUS & TOPICS COVERED
- TF-IDF
- The ‘bag-of-words’ approach and when it is used
- Weighting terms in a corpus
- Implementation of TF-IDF weighting
- Topic Modeling
- Topic modeling
- Latent Dirichlet Allocation as topic modeling technique
- Implementation of LDA
SOFTWARE REQUIREMENTS
You will have access to a Python-based JupyterHub environment for this course. No additional download or installation is required.