Data scientists, data analysts, business analyst, owners of a data driven company, what do they have in common? They all need to be sure that the data that they’ll be consuming is at its optimal stage.
Right now with the emergence of Big Data, Machine Learning, Deep Learning and Artificial Intelligence (The New Era as I call it) almost every company or entrepreneur wants to create a solution that uses data to predict or analyze.
Until now there was no solution to the common problem for all data driven projects for the New Era -> Data cleansing and exploration. With Optimus we are launching an easy to use, easy to deploy to production, and open source framework to clean and analyze data in a parallel fashion using state of the art technologies.
Optimus 1.0.0 framework
Optimus first official release is out now! With this version you can use Apache Spark 2.2.0 and Python to build your data pipelines in an easy and scalable way. You can detect outliers and erase them, impute missing data using machine learning, clean special characters in your data set, move and update your columns with our data wrangling tools, make beautiful plots to share your discoveries and much more!
Please enter in our webpage and documentation for a full description of the framework https://hioptimus.com.
If you want a peak of what can Optimus can do for you make sure to visit our examples:
https://nbviewer.jupyter.org/github/ironmussa/Optimus/blob/master/examples/Optimus_Example.ipynb