Topic Modeling in NLP

This course intermediate concepts in natural language processing, equipping learners with the ability to clean and process large amounts of text data, segregating text into different groups and topics, as well as finding similarities between different documents. As natural language can be vague and subjective, the course also presents ways to evaluate and interpret these language models.

6 hours of instruction

This course intermediate concepts in natural language processing, equipping learners with the ability to clean and process large amounts of text data, segregating text into different groups and topics, as well as finding similarities between different documents. As natural language can be vague and subjective, the course also presents ways to evaluate and interpret these language models.

OBJECTIVES

  1. Understand and implement bag of words and term frequency inverse document frequency (TF-IDF)
  2. Process, clean, and format text data for analysis
  3. Extract key summary metrics and words from a corpus of documents
  4. Perform latent Dirichlet allocation (LDA) for topic modelling

PREREQUISITES

Introduction to NLP

SYLLABUS & TOPICS COVERED

  1. TF-IDF
    • The ‘bag-of-words’ approach and when it is used
    • Weighting terms in a corpus
    • Implementation of TF-IDF weighting
  2. Topic Modeling
    • Topic modeling
    • Latent Dirichlet Allocation as topic modeling technique
    • Implementation of LDA

SOFTWARE REQUIREMENTS

You will have access to a Python-based JupyterHub environment for this course. No additional download or installation is required.

About Instructor

DataSociety

148 Courses

Not Enrolled
This course is currently closed