After a long break of 5 weeks I am back to blogging, Today we will go through Twitter Sentiment Analysis using R on #RoyalWedding.

Last few years has been interesting revolution in social media, it is not just platform where people talk to one another but it has become platform where people:

  • Express interests
  • Share views
  • Show dissent
  • Praise or criticize companies or politicians

So in this article we will learn how to analyze what people are posting on Twitter to come up with an solution which helps us understand about the public sentiments

How to create Twitter app

Twitter has developed an API which we can use to analyze tweets posted by users and their underlying metadata. This API helps us extract data in structured format which can easily be analyzed.

To create Twitter app, you need to have twitter account and once you have that account visit twitter app page and create an application to access data. Step by step process is available on following link:

https://iag.me/socialmedia/how-to-create-a-twitter-app-in-8-easy-steps/

once you have created the app, you will get following 4 keys:
a. Consumer key (API key)
b. Consumer secret (API Secret)
c. Access Token
d. Access Token Secret

These above keys we will use it to extract data from twitter to do analysis

Implementing Sentiment Analysis in R

Now, we will write step by step process in R to extract tweets from twitter and perform sentiment analysis on tweets. We will select #Royalwedding as our topic of analysis

Extracting tweets using Twitter application
Install the necessary packages

# Install packages
install.packages("twitteR", repos = "http://cran.us.r-project.org")
install.packages("RCurl", repos = "http://cran.us.r-project.org")
install.packages("httr", repos = "http://cran.us.r-project.org")
install.packages("syuzhet", repos = "http://cran.us.r-project.org")

# Load the required Packages
library(twitteR)
library(RCurl)
library(httr)
library(tm)
library(wordcloud)
library(syuzhet)

Next step is set the Twitter API using the app we created and use the key along with access tokens to get the data

# authorisation keys
consumer_key = "ABCD12345690XXXXXXXXX" #Consumer key from twitter app
consumer_secret = "ABCD12345690XXXXXXXXX" #Consumer secret from twitter app
access_token = "ABCD12345690XXXXXXXXX" #access token from twitter app
access_secret ="ABCD12345690XXXXXXXXX" #access secret from twitter app

# set up
setup_twitter_oauth(consumer_key,consumer_secret,access_token, access_secret)
## [1] "Using direct authentication"
# search for tweets in english language
tweets = searchTwitter("#RoyalWedding", n = 10000, lang = "en")
# store the tweets into dataframe
tweets.df = twListToDF(tweets)

Above code will invoke twitter app and extract the data with tweets having “#Royalwedding”. Since, Royal wedding is the flavor of season and talk of the world with everyone expressing their views on twitter.

Data Cleaning tweets for further analysis
We will remove hashtags, junk characters, other twitter handles and URLs from the tags using gsub function so we have tweets for further analysis

# CLEANING TWEETS

tweets.df$text=gsub("&amp", "", tweets.df$text)
tweets.df$text = gsub("&amp", "", tweets.df$text)
tweets.df$text = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", tweets.df$text)
tweets.df$text = gsub("@\\w+", "", tweets.df$text)
tweets.df$text = gsub("[[:punct:]]", "", tweets.df$text)
tweets.df$text = gsub("[[:digit:]]", "", tweets.df$text)
tweets.df$text = gsub("http\\w+", "", tweets.df$text)
tweets.df$text = gsub("[ \t]{2,}", "", tweets.df$text)
tweets.df$text = gsub("^\\s+|\\s+$", "", tweets.df$text)

tweets.df$text <- iconv(tweets.df$text, "UTF-8", "ASCII", sub="")

Now we have only relevant part of tweets which can use for analysis

Getting sentiments score for each tweet

Lets score the emotions on each tweet as syuzhet breaks emotion into 10 different categories.

# Emotions for each tweet using NRC dictionary
emotions <- get_nrc_sentiment(tweets.df$text)
emo_bar = colSums(emotions)
emo_sum = data.frame(count=emo_bar, emotion=names(emo_bar))
emo_sum$emotion = factor(emo_sum$emotion, levels=emo_sum$emotion[order(emo_sum$count, decreasing = TRUE)])

Post above steps, we are ready to visualize results to what type of emotions are dominant in the tweets

# Visualize the emotions from NRC sentiments
library(plotly)
p <- plot_ly(emo_sum, x=~emotion, y=~count, type="bar", color=~emotion) %>%
  layout(xaxis=list(title=""), showlegend=FALSE,
         title="Emotion Type for hashtag: #RoyalWedding")
api_create(p,filename="Sentimentanalysis")

Here we see majority of the people are discussing positive about Royal Wedding which is good indicator for analysis.

Lastly, lets see which word contributes which emotion:

# Create comparison word cloud data

wordcloud_tweet = c(
  paste(tweets.df$text[emotions$anger > 0], collapse=" "),
  paste(tweets.df$text[emotions$anticipation > 0], collapse=" "),
  paste(tweets.df$text[emotions$disgust > 0], collapse=" "),
  paste(tweets.df$text[emotions$fear > 0], collapse=" "),
  paste(tweets.df$text[emotions$joy > 0], collapse=" "),
  paste(tweets.df$text[emotions$sadness > 0], collapse=" "),
  paste(tweets.df$text[emotions$surprise > 0], collapse=" "),
  paste(tweets.df$text[emotions$trust > 0], collapse=" ")
)

# create corpus
corpus = Corpus(VectorSource(wordcloud_tweet))

# remove punctuation, convert every word in lower case and remove stop words

corpus = tm_map(corpus, tolower)
corpus = tm_map(corpus, removePunctuation)
corpus = tm_map(corpus, removeWords, c(stopwords("english")))
corpus = tm_map(corpus, stemDocument)

# create document term matrix

tdm = TermDocumentMatrix(corpus)

# convert as matrix
tdm = as.matrix(tdm)
tdmnew <- tdm[nchar(rownames(tdm)) < 11,]

# column name binding
colnames(tdm) = c('anger', 'anticipation', 'disgust', 'fear', 'joy', 'sadness', 'surprise', 'trust')
colnames(tdmnew) <- colnames(tdm)
comparison.cloud(tdmnew, random.order=FALSE,
                 colors = c("#00B2FF", "red", "#FF0099", "#6600CC", "green", "orange", "blue", "brown"),
                 title.size=1, max.words=250, scale=c(2.5, 0.4),rot.per=0.4)

plot of chunk unnamed-chunk-22

This is how word cloud on tweets with #Royalwedding looks like. Basically using R, we can analyse the sentiments on the social media and this can be extended to particular handle or product to see what people are saying in social media and whether is it negative or positive

Please feel free to ask any questions or want me to write on any specific topic

Do subscribe to Tabvizexplorer.com to keep receiving regular updates.

Recently, I had an requirement from my team if we can retrieve twitter data and do some visualization out of it to make some analysis. After going to through several blogs I pounced on blog from TableauJunkie on WDC on twitter and extract the data for analysis. Please look at the below dashboard I created for #TrudeauInIndia from Twitter

Following are the steps:

Step 1: Create your own Twitter API key to return the results

https://apps.twitter.com

once you create your API, twitter will provide access keys which will be used to provide search results.

For Tableau, we will need Web data connector for Twitter. I have used the web data connector shared by Alex ross on this blog Tableau Junkie:

http://tableaujunkie.com/post/119681578798/creating-a-twitter-web-data-connector

one can download the code and set up your own website or localhost to host the connector. There is also an option to modify the number of tweets count one needs.

Step 2: I used XAMP on my machine and pasted the files onto htdocs folder to host the connector on localhost.

Step 3: Start Tableau –> more –> web data connector –> enter the URL where you are hosting the files. Most probably it will be:

http://localhost/TwitterSearch/twitterwebconnect.html

Enter the hashtag or twitter account you want to search

Step 4: Go to Sheet and start creating your data visualization

Hope this will help you in future to generate twitter trends using Tableau