Meeting the Challenge
DataArt was chosen as a trusted development partner with a strong experience in building Big Data and IoT based solutions.
The main problems were the huge amount of data, the complicated layer structure and a wide array of file formats. Therefore, the DataArt team decided to train a classifier using a model in which the parameters were words from a test sample with a value equal to the number of keywords in the text.
As a training model our team used a pruned decision tree from the WEKA library. For a quick search of suspected error locations, the IoT team wrote an Apache Spark Streaming job, which «listens» to the stream of log messages, processes them, and performs a real-time NLP-analysis of each log file. If an error has occurred, the end user receives an alert with a description of the potential problem.

Business Benefits
The DataArt team developed an intricate solution, which was aimed at predicting potential failures as quickly as possible, basically as they occur. Therefore, it provides our client much more time to fix issues.
OpenStack technologies enable companies to be more innovative and develop business models faster. A simple cluster failure for a company that strongly depends on its infrastructure can turn out to be disastrous.
The PoC project demonstrates the possibility of using Machine Learning combined with OpenStack for monitoring purposes to save companies from experiencing unnecessary problems. More importantly, the PoC demonstrates that OpenStack can be used to address a wide range of issues with Machine Learning algorithms.
