So, I’ve been working through this tutorial and I understand it and am impressed with NLTK; however, I am somewhat more ambitious with my natural language processing goals.
What I’m doing is analyzing video game reviews of different games, to see how different games are reviewed differently. I have some large text corpuses of video games that seem to have received “normal” review cycles, and one big one that got review bombed. I want to do a couple of things:
Anyone able to advise how I should proceed? I’ve used NLTK both as shown in this tutorial and through TextBlob, but I’ve also used Tensorflow’s CPU based classification model and spaCy, the last of which is my favorite. I’m wondering if gensim - which I played around with briefly but I couldn’t tell if it was the right sort of specialized for what I’m doing - might be what I need.
Relatedly: I have all of my data stored individually as reviews associated with a particular video game in a neo4j instance. I can query neo at any time to retrieve this data. For these big corpuses, is there any value to running text classification on individual reviews? I wouldn’t even bother, I’d just combine them into giant corpuses to be broken apart, but only Tensorflow has been able to handle even my smaller corpus of reviews (about 3MB of text; the controversial game is almost 20MB of raw reviews).
Any help would be deeply appreciated! Cheers, Ellie
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Click below to sign up and get $200 of credit to try our products over 60 days!