Big Data Infrastructure

Big data technologies enable the application of data mining on large volumes of data. However, there are other scenarios where big data is effective.

The need for a “Divide and Conquer” approach emerges when your application can no longer handle the data volume and growth velocity. This situation is known as a Big Data problem.

There are different big data technologies you can leverage to let your application scale, they range from data sharding, in-memory databases,  SQL on Hadoop, NoSQL databases and specialized big data components for search, graphs, text and so on. They may or may not involve the hadoop ecosystem : we make the best of both worlds, the traditional database technologies and the new big data technologies.

We can advise on deployment alternatives, if you don’t need or want to deploy your infrastructure in-house: private or public cloud and managed services are other options.

Services

 

  • Deployment Service: selection of state-of-the-art software upon requirement specitications. Installation and configuration of software on-premise or in the cloud.
  • Data Engineering Service: support in the implementation of data pipelines. From ingestion to exploitation.
  • Machine Learning Implementation Services: development of machine learning solutions leveraging your big data infrastructure.

 

 

Technologies

 

  • Apache HDFS
  • Apache Parquet

 

 

  • Apache Squoop
  • Apache Kafka

 

  • Apache Cassandra
  • Apache Hbase
  • Apache Tez
  • Apache Hive
  • Apache Drill
  • Apache Impala
  • Presto
  • Spark SQL
  • Lucene
  • Elasticsearch
  • Scala
  • Python