Text Mining with R Notes
Textbook
Tutorial
-
Sentiment Analysis: Introduction to the Syuzhet Package
Definitons
- A token is a meaningful unit of text, most often a word, that we are interested in using for further analysis, and tokenization is the process of splitting text into tokens.
Key functions
-
unnest_tokens()
: do tokenization and get one-word-per-row format. -
anti_join(get_stopwords())
: We can remove stop words (accessible in a tidy form with the functionget_stopwords()
) with ananti_join
.
R code
# Loading necessary libraries
library(sentimentr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(magrittr)
# Example text
mytext <- c("The phone has scratches.", "The phone has no scratches.")
# Converting text into sentences
mytext <- get_sentences(mytext)
# Performing sentiment analysis
sentiment(mytext)
## element_id sentence_id word_count sentiment
## 1: 1 1 4 -0.3000000
## 2: 2 1 5 0.2683282
library(syuzhet)
##
## Attaching package: 'syuzhet'
## The following object is masked from 'package:sentimentr':
##
## get_sentences
# Example sentences
sentences <- c("The phone has scratches.", "The phone has no scratches.")
# Get sentiment scores
sentiment_scores <- get_nrc_sentiment(sentences)
# View scores
sentiment_scores
## anger anticipation disgust fear joy sadness surprise trust negative positive
## 1 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0
# not work well