Big Data

Big Data subscription active
You will receive email notifications for new publications on Big Data.
39 Results
  • Tutorial

    How to Install Hadoop in Stand-Alone Mode on Ubuntu 16.04

    Hadoop is a Java-based programming framework that supports the processing and storage of extremely large datasets on a cluster of inexpensive machines. It was the first major open source project in the big data playin...
    By Melissa Anderson Clustering Big Data Ubuntu Ubuntu 16.04
  • Tutorial

    An Introduction to Big Data Concepts and Terminology

    Big data is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large data sets. While the problem of working with data that exceeds the com...
    By Justin Ellingwood Scaling Clustering Big Data Conceptual
  • Tutorial

    Hadoop, Storm, Samza, Spark, and Flink: Big Data Frameworks Compared

    Big data is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large datasets. While the problem of working with data that exceeds the comp...
    By Justin Ellingwood Big Data Conceptual
  • Tutorial

    An Introduction to Hadoop

    Apache Hadoop is one of the earliest and most influential open-source tools for storing and processing the massive amount of readily-available digital data that has accumulated with the rise of the World Wide Web. It ...
    By Melissa Anderson Clustering Big Data Conceptual
  • Tutorial

    How to Install Hadoop in Stand-Alone Mode on Ubuntu 18.04

    In this tutorial, you'll learn how to install Hadoop in stand-alone mode on an Ubuntu 18.04 server. You'll also run an example MapReduce program to search for occurrences of a regular expression in text files.
    By Melissa Anderson, Hanif Jetha Clustering Big Data Ubuntu Ubuntu 18.04
  • Tutorial

    How To Spin Up a Hadoop Cluster with DigitalOcean Droplets

    This tutorial will cover setting up a Hadoop cluster on DigitalOcean. The Hadoop software library is an Apache framework that lets you process large data sets in a distributed way across server clusters through levera...
    By Jeremy Morris Big Data Data Analysis Solutions Clustering DigitalOcean Ubuntu 16.04
  • Tutorial

    How To Install and Use ClickHouse on Debian 9

    ClickHouse is an open-source, column-oriented analytics database created by Yandex (https://yandex.com) for OLAP and big data use cases. In this tutorial, you'll install the ClickHouse database server and client on yo...
    By bsder Databases Data Analysis Big Data Debian 9
  • Tutorial

    User Data Collection: Balancing Business Needs and User Privacy

    Collecting user data is common practice in modern sites and applications as a way of providing creators with more information to make decisions and create better experiences. Among other benefits, data can be used to ...
    By Justin Ellingwood Conceptual Big Data Data Analysis
  • Tutorial

    How to Install Hadoop in Stand-Alone Mode on Debian 9

    In this tutorial, you'll install Hadoop in stand-alone mode on a Debian 9 server. You'll also run an example MapReduce program to search for occurrences of a regular expression in text files.
    By Brian Hogan, Melissa Anderson, Hanif Jetha Big Data Debian 9
  • Tutorial

    How to Set Up the Titan Graph Database with Cassandra and ElasticSearch on Ubuntu 16.04

    Titan is an open-source Graph database that is highly scalable. A Graph database is a type of NoSQL where all data is stored as nodes and edges. A graph database is suitable for applications that use highly connected ...
    By Kevin Isaac Big Data Elasticsearch Ubuntu 16.04
  • Question

    Getting account unblocked

    Hey guys, new here. Been using Digital Ocean off and on for the last few years, mostly for personal projects and have really liked it a lot. Recently, though I started using it for Big Data processes. What I do is spi...
    Accepted Answer: Hey friend, I'd like to explain a bit about the reason for this, and the thoughts behind it. Please know that I'm about to say a lot of things that may not be relevant to you. It isn't necessarily that crypto is again...
    1 By jwalz DigitalOcean Big Data Ubuntu 18.04
  • Question

    What are the most popular Hadoop tools/projects?

    I have a question what are the most popular Hadoop tools/projects?
    Accepted Answer: Hive is an SQL-like language for data processing, which gets converted into a MapReduce job behind the scenes. Hive is popular because it is written using familiar SQL-like syntax. This is often confusing, because Hiv...
    2 By gulatisneha56 Big Data
  • Question

    Nexii Labs is a leading storage, virtualisation and Cloud service providers in India

    DevOps has changed the way an IT organization works and how it gets things done. Devops services and offerings connects development, technical operations and quality assurance personnel in such a way that the process...
    Accepted Answer: @ryanpq SPAM!
    1 By nexiilabs Backups Storage Getting Started Open Source Big Data Clustering CoreOS Arch Linux Ubuntu Ubuntu 16.04 Debian
  • Question

    Can you send data to a server through cellular?

    Hello all. At my current job we have sensors on one of our building's roofs that sends environmental data from the roof to a physical server in the building. This system is proprietary and our building does not allow ...
    1 By csmall9 Applications Databases Open Source Big Data Conceptual
  • Question

    What is Quantum computing ?

    What is quantum computing and future of quantum computing
    1 By chandu12fvl Big Data Databases DigitalOcean Machine Learning
  • Question

    CPU Optimized Droplet works very slow

    Hello, I was using a CPU Optimized Droplet. At first month it worked very fast with a low amount of information for example a process that I developed, it took a maximum of 2 minutes to to deliver results. But the sec...
    2 By AndresRamos95 DigitalOcean Big Data
  • Question

    I would like to max my cpu usage for foreseeable future. Can I?

    I need to do math. Math is hard. I would use 100% CPU for foreseeable future. Is this allowed? I have old account and all is prepaid. I would use a new 3 CPU 15$ droplet.
    2 By DigitalOceana234400ef21fe9 DigitalOcean Big Data
  • Question

    Kafka requires Zookeeper, but the 'Installing Kafka' tutorial doesn't install Zookeeper. Then Kafka magically works anyway? What? How?

    Hello, all, I'm new to Ubuntu, Kafka, and Zookeeper, and this has me puzzled. From everything I've read, Zookeeper is part and parcel to Kafka. However, the tutorial here (https://www.digitalocean.com/community/tutori...
    2 By abelwingnut Apache Big Data DigitalOcean Articles Development NoSQL Ubuntu 18.04
  • Question

    How to change all my space files permissions

    Hi. I uploaded 60G of images to my new space and i found out that all the files are with private permission. How can i change the permission to public to all my files in the space? I got around 7M images. Thanks, Liron
    1 By liron DigitalOcean API Big Data Ubuntu
  • Question

    Scraping data on a website

    Hello ! First, sorry for my english. I have a PHP script to scrap data from a website. I would like to run this script and be untraceable... Is there a Digital Ocean solution that meets my needs? Thanks. Regards.
    1 By mckbgbg Big Data Automated Setups Debian