So, I’ve been working through this tutorial and I understand it and am impressed with NLTK; however, I am somewhat more ambitious with my natural language processing goals.
What I’m doing is analyzing video game reviews of different games, to see how different games are reviewed differently. I have some large text corpuses of video games that seem to have received “normal” review cycles, and one big one that got review bombed. I want to do a couple of things:
Anyone able to advise how I should proceed? I’ve used NLTK both as shown in this tutorial and through TextBlob, but I’ve also used Tensorflow’s CPU based classification model and spaCy, the last of which is my favorite. I’m wondering if gensim - which I played around with briefly but I couldn’t tell if it was the right sort of specialized for what I’m doing - might be what I need.
Relatedly: I have all of my data stored individually as reviews associated with a particular video game in a neo4j instance. I can query neo at any time to retrieve this data. For these big corpuses, is there any value to running text classification on individual reviews? I wouldn’t even bother, I’d just combine them into giant corpuses to be broken apart, but only Tensorflow has been able to handle even my smaller corpus of reviews (about 3MB of text; the controversial game is almost 20MB of raw reviews).
Any help would be deeply appreciated! Cheers, Ellie