The Pokémon community is happy! Or that’s the impression I got. On January 28, 2022, the latest Pokémon game, titled Pokémon Legends: Arceus, was released. The community embraced the game. They called it the most innovative Pokémon title in years, reviews are the best the series has seen in years, and I’ve amassed over 100 hours in a month. So, yes, I got a good impression. But are the other players enjoying the game? Moreover, what are they saying about it? I'll answer these questions with data.

A few years ago, the Pokémon community was upset. During the announcement of the previous installment of the series, Pokémon Sword and Shield, the developer Game Freak reported they would not include all the existing Pokémon in the game. The fans didn't like this; they were angry, sad, and disappointed. In an attempt to quantify this discomfort, I collected and analyzed a collection of tweets, including hashtags related to the game, and found out that, on average, the tweets were slightly negative. So now, I want to offset that experiment's negative feeling.

My gut feeling is telling me the fans are enjoying this new game. But I wanted more evidence. And so, once again, I collected tweets related to the game to investigate if their content was positive. In this new experiment, I used the natural language processing (NLP) Python library spaCy, Google Cloud Natural Language API, and the R language to discover the tweets’ top used nouns and adjectives, the tweets' sentiment, and the top mentioned Pokémon. And here, I’ll present the results of my investigation.

You can find the investigation’s source code at: https://github.com/juandes/pokemon-legends-tweets

About the data

My dataset has 173296 tweets I collected from February 4, 2022, to February 27, 2022, with the hashtags #PokemonLEGENDS or #PokemonLegendsArceus. Most of the tweets are retweets, so I cleaned up the dataset and ended up with 40282 unique ones. Notwithstanding, you'll find similar spammy tweets that differ by a few words. (I could have used other techniques to find these cases but went against it in the end).

I collected the data using the Python library Tweepy, and you can find the script I used in the linked repository.

Top nouns and adjectives

One of spaCy's features is part-of-speech (POS) tagging, the task of assigning a grammatical category to each token of a document. Examples of these categories—and the ones I used— are nouns and adjectives. As a refresher, nouns are words that name objects (for instance, a tree), people, actions, feelings, and anything we would generally call "thing," while adjectives describe the nouns, e.g., the word "large" in the sentence "the large tree." Using Python and spaCy, I wrote a script that goes through each tweet to tag their terms and count their occurrence. Then, I selected the top 15 and visualized them in two charts (Figures 1 and 2) you will see next.

Figure 1: Top 15 used nouns.
Figure 1: Top 15 used nouns.

The top 15 nouns are a varied group that, among some things, cover some of the game’s mechanics. One of these is “alpha,” referring to Alpha Pokémon, a new kind of red-eyed-Pokémon that are larger and stronger than their usual counterpart. Other mechanics mentioned are “outbreak,” an event that spawns a swarm of the same Pokémon in a specific location, and “shinie/shiny,” a rare variant of a Pokémon with a different color than its regular version. The tweeters often mentioned these three mechanics within the same message to celebrate and share their recently elusive Shiny Alpha Pokémon, a rare type of Pokémon whose appearance rate increases during an outbreak. Under normal circumstances, the chance of finding a shiny Pokémon is 1/4096, but during an outbreak (which might spawn Alpha Pokémon), the chance increases to 1/158.2. So yes, I can understand why people would flex their low-probable creatures on Twitter.

Figure 2: Top 15 used adjectives.
Figure 2: Top 15 used adjectives.

The top adjectives (seen above) are positive. There are terms like “good,” “cute,” “happy,” “nice,” and “great,” which convey a fuzzy warm feeling. There’s also “shiny,” which we can use as an adjective to describe a shiny Pokémon, e.g., “the Pokémon is shiny,” instead of using the name “Shiny Pokémon.” But amid those happy words, one stands out—and that’s “bad.” To know what exactly is bad, I searched for tweets containing that term. However, instead of hate, I found more fuzzy and warm comments, for they weren't using the term negatively. Some tweets exclaimed how badly they wanted the game or to catch a shiny Pokémon, others said, “I suck so bad” (at the game), and one was celebrating how it beat the “the big bad boss battle.” Congrats to you, my friend!

A normal Scyther (left) and its shiny variant (right). Note the pink neck and darker greens. Photo by me.
A normal Scyther (left) and its shiny variant (right). Note the pink neck and darker greens. Photo by me.

Sentiment analysis

From the adjectives above, I could conclude that people enjoy the game. But that’s a subjective conclusion—one that’s tainted by my idea of what a positive adjective is (I’m exaggerating here, but I want an excuse to introduce my next method) and by what I saw after reading some tweets. To work around this subjectivity, I used a natural language processing technique named sentiment analysis. This method scores each tweet with a number that measures its prevailing emotional attitude, allowing us to determine whether its feeling is negative, neutral, or positive. The service I used to do the sentiment analysis, Google Cloud’s Natural Language API, rated each tweet with a score between -1 to 1, where -1 implies an overall negative emotion, 0 indicates low or mixed emotions, and 1, a positive emotion. For example, the sentences “I HATE YOU PARAS” has a score of -0.9, “Shiny Aipom in an Outbreak,”, 0.0 and “I love the expressions in this game XD,” 0.9.

The mean sentiment value of the tweet sample is 0.141. So, we could quickly conclude that many tweets convey a positive feeling—what my gut feeling said. (I said “sample” because I selected 3000 random tweets from the dataset to accelerate the score gathering process and because the service I used is not free). The median score is 0.100(which also leans toward positive), the 25th percentile is -0.100, the 75th percentile is 0.5, and the standard deviation is 0.483. In the following chart (Figure 3), you’ll find a histogram of the scores' distribution. It shows that most values fall in the 0.0 and 0.9 region, which explains the high standard deviation of 0.483. Or, in simple words, we could say there are many neutral and positive tweets but not many negatives.

Figure 3: Histogram of the tweets' sentiment score.
Figure 3: Histogram of the tweets' sentiment score.

Before moving on to the next section, I need to say that artificial intelligence, and thus, NLP algorithms, can be biased. So, I cannot firmly say the tweets are indeed positive. Furthermore, since many AI systems are trained on corpora from common text such as Wikipedia and not from tweets about Pokémon, I’m expecting to have inaccurate sentiments scores. The table below has ten tweets and their sentiment scores; be the judge and let me know your thoughts about the scores.

Some tweets and their sentiment score.
Some tweets and their sentiment score.
An angry Alpha Snorlax who isn't looking that positive. Photo by me.
An angry Alpha Snorlax who isn't looking that positive. Photo by me.

Top mentioned Pokémon

This last section is pure fan service. I won’t introduce another NLP technique and won’t mention feelings. No. Here, I’ll present the top 15 most mentioned Pokémon among all the tweets. Follow me to Figure 4.

Figure 4: Top 15 mentioned Pokémon.
Figure 4: Top 15 mentioned Pokémon.

The tweets’ most popular Pokémon is Arceus, with 2131 mentions—almost six times more than its successor Eevee. But it is somehow expected to have Arceus at the top, considering the game is named after it. Removing the eponymous Pokémon from the list yields a chart (Figure 5) where Eevee takes the first spot with 170 more mentions than the runner-up, Typhlosion.

Figure 5: Top 15 mentioned Pokémon (excluding Arceus).
Figure 5: Top 15 mentioned Pokémon (excluding Arceus).
My character and Arceus. Photo by me.
My character and Arceus. Photo by me.

Final words

Pokémon Legends: Arceus is out, and the consensus says it is a good game. Its current Metascore is 83 (out of 100), and the user score is 8.3 out of 10. To compare with, the previous title’s Metascore is 73, and the user score is 5.4. Beyond the ratings, the game’s buzz has been positive. That’s the impression I got on Reddit, Discord, and Twitter. To check if my intuition was right, I collected a corpus of tweets and analyzed it with NLP techniques to discover what people were saying about the game and how they said it.

The outcome was positive. People were mostly talking about the game’s new features and showing off their shiny and gigantic Pokémon using pleasant terms “good,” “cute,” and “happy.” Then, I used a sentiment analysis service to quantify the messages’ emotional attitude, and once again, the results leaned towards the positive side.

It’s good to see the community this happy.