We help our customers take advantage of modern Advanced Analytics tools and techniques in a pragmatic, to-the-point fashion.
This could be done either as an extension of your existing Data Warehouse infrastructure or as a standalone data science project.
At the beginning, there is a vague question and an uncertainty of what is possible to do using Advanced Analytics to improve your business. It’s not clear if the data available, if the objectives are realistic and if to embark in a data science project is worth the effort. During this stage we provide you with use cases that relate to your business domain and make an initial assessment of your data assets.
Also called “data plumbing”, “data mungling”, “data cleaning” or even “data transformation”. Between 50 and 80 percent of a data science project is spent getting, cleaning, tidying, transforming and preparing the data. Real-life data is very messy, noisy and ugly. We know how to handle it.
We explore the data to get useful insights. This means presenting and plotting the data in a variety of ways to extract patterns and test assumptions. We leverage our domain knowledge your own business knowledge to make sure we understand what we observe. We know the main tools of the trade, both open source and proprietary.
We build, test and validate models for classification, regression and clustering tasks, amongst others. We chose the actual machine learning algorithms for the model development. When a problem is too complex, it is sometimes necessary to craft an algorithm or to combine several ones to get optimal results. In contrast with the theoretical modeling aspects, we care about feasibility and simplicity. Sometimes it is not realistic to deploy an algorithm to a production environment, even if it is very good.
Once a model is validated and an algorithm is implemented, we need to deploy it to get actual results. It can be a recommender system or an analytics backend, or a REST API. We help you make the best use of it as a data product and integrate it into your environment.
We are technology agnostics, but embrace standards and promote Open Source tools. Here is a sample of tools and techniques we use in our data science projects.
R, Python, Rapidminer, Weka
R (lots of packages), Python scikit-learn, Vowpal Wabbit, Apache Spark MLlib, H20 (API/Sparkling Water), Apache Mahout
.R/Python standard packages & modules Tableau, Apache Zeppelin, iPython, Spark notebooks, Microsoft PowerBI.