{"id":58971,"slug":"natural-language-classification-with-python-best-library-resource-for-determining-sentiment-associated-w-words","state":"published","title":"Natural language classification with Python - best library/resource for determining sentiment associated w words?","content":"So, I've been working through [this tutorial](https://www.digitalocean.com/community/tutorials/how-to-perform-sentiment-analysis-in-python-3-using-the-natural-language-toolkit-nltk?segment=1*1nr6qzo*s_amp_id*Q0dWUnZTWXpqN0s3cm9fV0ZPUFdzeFRocDJOUUZZNzJ3ZzFwaExLX2ZlaENPc3ZId0Q4azVwWk13ZFVUS3dtNQ..) and I understand it and am impressed with NLTK; however, I am somewhat more ambitious with my natural language processing goals. \r\n\r\nWhat I'm doing is analyzing video game reviews of different games, to see how different games are reviewed differently. I have some large text corpuses of video games that seem to have received \"normal\" review cycles, and one big one that got review bombed. I want to do a couple of things:\r\n\r\n1. Compare the \"normal\" corpus with the \"controversial\" corpus and see what stands out most.\r\n2. Determine word associations, both sentiment wise (is \"graphics\" more positive for some games than others?) and in terms of each corpus having words occur more with other words (\"graphics\" and \"nextgen\" maybe). The thing is I don't have a list of words to check - I want to sort of find this emergently.\r\n\r\nAnyone able to advise how I should proceed? I've used NLTK both as shown in this tutorial and through TextBlob, but I've also used Tensorflow's CPU based classification model and spaCy, the last of which is my favorite. I'm wondering if gensim - which I played around with briefly but I couldn't tell if it was the right sort of specialized for what I'm doing - might be what I need.\r\n\r\nRelatedly: I have all of my data stored individually as reviews associated with a particular video game in a neo4j instance. I can query neo at any time to retrieve this data. For these big corpuses, is there any value to running text classification on individual reviews? I wouldn't even bother, I'd just combine them into giant corpuses to be broken apart, but only Tensorflow has been able to handle even my smaller corpus of reviews (about 3MB of text; the controversial game is almost 20MB of raw reviews).\r\n\r\nAny help would be deeply appreciated! Cheers, Ellie","tutorial_id":null,"user_id":1291628,"editor_id":null,"language":"en","created_at":"2020-11-05T03:20:25.020Z","updated_at":"2020-11-05T03:20:25.568Z","last_validated_at":null,"views":384,"accepted_id":null,"pinned_id":null,"comments_locked_at":null,"answers_locked_at":null,"new_answers_locked_at":null,"user":{"id":1291628,"username":"drellielockhart","slug":"drellielockhart","first_name":null,"last_name":null,"title":null,"bio":null,"website":null,"twitter_handle":null,"github_handle":null,"linkedin_url":null,"stackoverflow_url":null,"show_completed_tutorials_at":null,"city":null,"country":null,"skills":null,"has_seen_registration_at":"2022-08-30T17:06:14.808Z","created_at":"2020-10-30T01:46:50.161Z","updated_at":"2020-11-05T03:19:02.508Z","email":"e65cd164bd6b21708403da55105c9227bb5d948b93e70f2293f9bf85b3b8003c","gravatar":"https://www.gravatar.com/avatar/0265005d904f49bbd3f288cc1cc6379c4fa7051f33df1d9081d7aa708842a10f?default=retro"},"editor":{},"tutorial":{},"tags":[{"id":121,"state":"published","slug":"project","name":"Programming Project","description":"Programming Project tutorials bring together several programming concepts and open opportunities for exploring solutions\r\n","created_at":"2017-02-23T01:27:39.633Z","updated_at":"2022-01-25T08:16:48.546Z","admin_use_only_at":null,"admin_view_only_at":null,"show_in_cms_at":null,"type":"tag"},{"id":114,"state":"published","slug":"development","name":"Development","description":"Tools and tips for users as they build and maintain software, including information on writing and revising code, prototyping, researching, and modifying problematic components. ","created_at":"2016-08-19T17:31:28.498Z","updated_at":"2022-01-28T15:00:37.430Z","admin_use_only_at":null,"admin_view_only_at":null,"show_in_cms_at":null,"type":"tag"},{"id":122,"state":"published","slug":"data-analysis","name":"Data Analysis","description":"Data analysis refers to the domain of fields that investigate the structure of data and use it to identify patterns and possible solutions to problems.","created_at":"2017-03-14T18:48:07.782Z","updated_at":"2022-01-28T08:06:10.521Z","admin_use_only_at":null,"admin_view_only_at":null,"show_in_cms_at":null,"type":"tag"},{"id":29,"state":"published","slug":"python","name":"Python","description":"Python is a flexible and versatile programming language that can be leveraged for many use cases, with strengths in scripting, automation, data analysis, machine learning, and back-end development. It is a great tool for both new learners and experienced developers alike.","created_at":"2022-01-31T18:04:44.879Z","updated_at":"2024-03-11T21:22:52.899Z","admin_use_only_at":null,"admin_view_only_at":null,"show_in_cms_at":null,"type":"tag"}]}