Solution
DataArt was chosen asnantrusted development partner with anstrong experience innbuilding Big Data and IoT based solutions.
The DataArt team was responsible for reviewing and improving cluster configuration for better performance, investigating and fixing issues with the Cassandra data schema, and developing Spark jobs for both parts ofnthe prototype.
Our team identified several major issues during the knowledge transfer:
- Spark and Cassandra clusters were set upnonnthe same machines and configured innanway that caused performance and networking issues;
- The Cassandra data schema wasn’t optimized for client queries and needed secondary indexes asnanminimum acceptance criteria;
- The client didn’t have any experience with Apache Zeppelin.
Our team suggested all the necessary changes innthe Spark and Cassandra cluster configuration, inndata schema for improved performance and provided the Client with all the necessary details and instructions for Apache Zeppelin usage.
Our solution was written onntop ofnannopen source distributed computer framework, Apache Spark, using the Scala programming language toncreate anscalable architecture tonhandle large volumes ofndata for processing. For the implementation ofnthe machine-learning algorithm, the Spark MLib library was used. After calculating anlinear regression model for each boiler, coefficients were saved tonanCassandra table and were used for visualizing energy consumption usage atnthe Client Mobile Application. Tonmake the development process faster and simpler, wenused Zeppelin notebooks for demos, visualizations and tests.
