Big data technologies enable the application of data mining on large volumes of data. However, there are other scenarios where big data is effective.

The need for a “Divide and Conquer” approach emerges when your application can no longer handle the data volume and growth velocity. This situation is known as a Big Data problem.

There are different big data technologies you can leverage to let your application scale, they range from data sharding, in-memory databases,  SQL on Hadoop, NoSQL databases and specialized big data components for search, graphs, text and so on. They may or may not involve the hadoop ecosystem : we make the best of both worlds, the traditional database technologies and the new big data technologies.

We also know other deployment alternatives, if you don’t need or want to deploy your infrastructure in-house : private or public cloud and managed services are other options.

Technologies

 

  • Apache HDFS
  • Apache Parquet

 

 

  • Apache Squoop
  • Apache Kafka

 

  • Apache Cassandra
  • Apache Hbase
  • Apache Tez
  • Apache Hive
  • Apache Drill
  • Apache Impala
  • Presto
  • Spark SQL
  • Lucene
  • Elasticsearch
  • Scala
  • Python