Text Mining In R

This course intermediate concepts in natural language processing, equipping learners with the ability to clean and process large amounts of text data, segregating text into different groups and topics, as well as finding similarities between different documents.

4 hours of instruction

This course intermediate concepts in natural language processing, equipping learners with the ability to clean and process large amounts of text data, segregating text into different groups and topics, as well as finding similarities between different documents.

OBJECTIVES

  1. Compute cosine similarity for corpus documents
  2. Demonstrate weighting with TF-IDF
  3. Implement cosine similarity to compare documents
  4. Visualize similar documents using interactive network graph

PREREQUISITES

Foundational understanding of NLP concepts.

SYLLABUS & TOPICS COVERED

  1. Cosine Similarity
    • Compute term similarity matrix
    • Create corpus term similarity heatmap
  2. TF-IDF
    • Implementation of TF-IDF weighting
    • Build network graphs to compare the documents

SOFTWARE REQUIREMENTS

You will have access to an R-based Posit Cloud environment for this course. No additional download or installation is required.

About Instructor

DataSociety

148 Courses

Not Enrolled
This course is currently closed