Big Data

Big Data subscription active
You will receive email notifications for new publications on Big Data.
47 Results
  • Tutorial

    How to Install Hadoop in Stand-Alone Mode on Ubuntu 16.04

    Hadoop is a Java-based programming framework that supports the processing and storage of extremely large datasets on a cluster of inexpensive machines. It was the first major open source project in the big data playin...
    By Melissa Anderson Clustering Big Data Ubuntu Ubuntu 16.04
  • Tutorial

    An Introduction to Big Data Concepts and Terminology

    Big data is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large data sets. While the problem of working with data that exceeds the com...
    By Justin Ellingwood Scaling Clustering Big Data Conceptual
  • Tutorial

    Hadoop, Storm, Samza, Spark, and Flink: Big Data Frameworks Compared

    Big data is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large datasets. While the problem of working with data that exceeds the comp...
    By Justin Ellingwood Big Data Conceptual Development
  • Tutorial

    How To Install Hadoop in Stand-Alone Mode on Ubuntu 18.04

    In this tutorial, you'll learn how to install Hadoop in stand-alone mode on an Ubuntu 18.04 server. You'll also run an example MapReduce program to search for occurrences of a regular expression in text files.
    By Melissa Anderson, Hanif Jetha Clustering Big Data Ubuntu Ubuntu 18.04
  • Tutorial

    An Introduction to Hadoop

    Apache Hadoop is one of the earliest and most influential open-source tools for storing and processing the massive amount of readily-available digital data that has accumulated with the rise of the World Wide Web. It ...
    By Melissa Anderson Clustering Big Data Conceptual
  • Tutorial

    How To Spin Up a Hadoop Cluster with DigitalOcean Droplets

    This tutorial will cover setting up a Hadoop cluster on DigitalOcean. The Hadoop software library is an Apache framework that lets you process large data sets in a distributed way across server clusters through levera...
    By Jeremy Morris Big Data Data Analysis Solutions Clustering DigitalOcean Ubuntu 16.04
  • Tutorial

    How To Install and Use ClickHouse on Ubuntu 20.04

    ClickHouse is an open source, column-oriented analytics database created by Yandex for OLAP and big data use cases. In this tutorial, you'll install the ClickHouse database server and client on your machine. You'll us...
    By bsder Big Data Databases Ubuntu 20.04
  • Tutorial

    User Data Collection: Balancing Business Needs and User Privacy

    Collecting user data is common practice in modern sites and applications as a way of providing creators with more information to make decisions and create better experiences. Among other benefits, data can be used to ...
    By Justin Ellingwood Conceptual Big Data Data Analysis
  • Tutorial

    How to Install Hadoop in Stand-Alone Mode on Debian 9

    In this tutorial, you'll install Hadoop in stand-alone mode on a Debian 9 server. You'll also run an example MapReduce program to search for occurrences of a regular expression in text files.
    By Brian Hogan, Melissa Anderson, Hanif Jetha Big Data Debian 9
  • Tutorial

    How To Install and Use ClickHouse on Debian 9

    ClickHouse is an open-source, column-oriented analytics database created by Yandex (https://yandex.com) for OLAP and big data use cases. In this tutorial, you'll install the ClickHouse database server and client on yo...
    By bsder Databases Data Analysis Big Data Debian 9
  • Tutorial

    What is Big Data?

    Big data is a blanket term for the non-traditional strategies and technologies needed to organize, process, and gather insights from large datasets. Many users and organizations are turning to big data for certain typ...
    By Brian Boucheron Glossary Big Data
  • Tutorial

    How to Set Up the Titan Graph Database with Cassandra and ElasticSearch on Ubuntu 16.04

    Titan is an open-source Graph database that is highly scalable. A Graph database is a type of NoSQL where all data is stored as nodes and edges. A graph database is suitable for applications that use highly connected ...
    By Kevin Isaac Big Data Elasticsearch Ubuntu 16.04
  • Question

    Recommended Droplet configuration for WebScrapping

    Hello everyone! I'm new to Digital Ocean and Im really in the need of learning more. I have a project that does several requests for serveral servers super constantly i.e imagine having a python script that has around...
    Accepted Answer: Hi there, I believe that the Droplet that you've selected is quite good for a start. I could suggest regularly checking the monitoring graphs via your DigitalOcean Control panel and see how the resources are being uti...
    1 By quirozvalandres Debian Big Data Building on DigitalOcean Networking DigitalOcean Droplets Ubuntu 20.04
  • Question

    Getting account unblocked

    Hey guys, new here. Been using Digital Ocean off and on for the last few years, mostly for personal projects and have really liked it a lot. Recently, though I started using it for Big Data processes. What I do is spi...
    Accepted Answer: Hey friend, I'd like to explain a bit about the reason for this, and the thoughts behind it. Please know that I'm about to say a lot of things that may not be relevant to you. It isn't necessarily that crypto is again...
    1 By jwalz DigitalOcean Big Data Ubuntu 18.04
  • Question

    What are the most popular Hadoop tools/projects?

    I have a question what are the most popular Hadoop tools/projects?
    Accepted Answer: Hive is an SQL-like language for data processing, which gets converted into a MapReduce job behind the scenes. Hive is popular because it is written using familiar SQL-like syntax. This is often confusing, because Hiv...
    2 By gulatisneha56 Big Data
  • Question

    Nexii Labs is a leading storage, virtualisation and Cloud service providers in India

    DevOps has changed the way an IT organization works and how it gets things done. Devops services and offerings connects development, technical operations and quality assurance personnel in such a way that the process...
    Accepted Answer: @ryanpq SPAM!
    1 By nexiilabs Backups Storage Getting Started Open Source Big Data Clustering CoreOS Arch Linux Ubuntu Ubuntu 16.04 Debian
  • Question

    How to fix 502 Bad Gateway error message?

    I was trying to upload a data file of around 600 MB in my project which is hosted in digital ocean. It tries to upload but 502 Bad Gateway Nginx error is shown. While the upload completely works fine on my local syste...
    2 By rikeshk012330 Nginx Big Data DigitalOcean Python DigitalOcean Droplets
  • Question

    Can i trust Digital ocean ?

    i want to run some important things in Do which require high uptime can i trust DO ?
    2 By anirudhahikoka404 Big Data
  • Question

    AI - Time series analysis with which DBMS?

    Hi guys, does anyone deal with AI? I recently visited an online forum on AI, because I'm very interested in it and I'm thinking about taking this direction professionally - neuroinformatics and artificial intelligence...
    0 By diggieFish Databases Big Data Data Analysis
  • Question

    How do I manually calculate average session duration?

    Hallo guys, can anyone help me tell the exact formula of the average session duration where to get from? I'm having trouble finding a manual calculation of the average session duration For example, from the google ana...
    1 By rifulabyssal Data Analysis DigitalOcean Articles Big Data GraphQL