Natural language classification with Python - best library/resource for determining sentiment associated w words?
So, I’ve been working through this tutorial and I understand it and am impressed with NLTK; however, I am somewhat more ambitious with my natural language processing goals.
What I’m doing is analyzing video game reviews of different games, to see how different games are reviewed differently. I have some large text corpuses of video games that seem to have received “normal” review cycles, and one big one that got review bombed. I want to do a couple of things:
- Compare the “normal” corpus with the “controversial” corpus and see what stands out most.
- Determine word associations, both sentiment wise (is “graphics” more positive for some games than others?) and in terms of each corpus having words occur more with other words (“graphics” and “nextgen” maybe). The thing is I don’t have a list of words to check - I want to sort of find this emergently.
Anyone able to advise how I should proceed? I’ve used NLTK both as shown in this tutorial and through TextBlob, but I’ve also used Tensorflow’s CPU based classification model and spaCy, the last of which is my favorite. I’m wondering if gensim - which I played around with briefly but I couldn’t tell if it was the right sort of specialized for what I’m doing - might be what I need.
Relatedly: I have all of my data stored individually as reviews associated with a particular video game in a neo4j instance. I can query neo at any time to retrieve this data. For these big corpuses, is there any value to running text classification on individual reviews? I wouldn’t even bother, I’d just combine them into giant corpuses to be broken apart, but only Tensorflow has been able to handle even my smaller corpus of reviews (about 3MB of text; the controversial game is almost 20MB of raw reviews).
Any help would be deeply appreciated! Cheers, Ellie