Energy Consumption Prediction Model Using Spark MLib

Client

The client is a manufacturer of heating and industrial systems.

Business Chanllenge

The client is working on an IoT prototype for their boiler to integrate well-known hardware with the Internet. Sensors from the device send notifications to the server, which saves binary values to a Cassandra database. The client wanted to use Apache Zeppelin for the visualization and a 3-node cluster for Spark and Cassandra.

The idea of the prototype consisted of two parts:

create a Spark job for converting data into a readable format;
develop a machine-learning algorithm for an energy-consumption prediction model based on the history of boiler usage coupled with a weather forecast.

Solution

DataArt was chosen asnantrusted development partner with anstrong experience innbuilding Big Data and IoT based solutions.

The DataArt team was responsible for reviewing and improving cluster configuration for better performance, investigating and fixing issues with the Cassandra data schema, and developing Spark jobs for both parts ofnthe prototype.

Our team identified several major issues during the knowledge transfer:

Spark and Cassandra clusters were set upnonnthe same machines and configured innanway that caused performance and networking issues;
The Cassandra data schema wasn’t optimized for client queries and needed secondary indexes asnanminimum acceptance criteria;
The client didn’t have any experience with Apache Zeppelin.

Our team suggested all the necessary changes innthe Spark and Cassandra cluster configuration, inndata schema for improved performance and provided the Client with all the necessary details and instructions for Apache Zeppelin usage.

Our solution was written onntop ofnannopen source distributed computer framework, Apache Spark, using the Scala programming language toncreate anscalable architecture tonhandle large volumes ofndata for processing. For the implementation ofnthe machine-learning algorithm, the Spark MLib library was used. After calculating anlinear regression model for each boiler, coefficients were saved tonanCassandra table and were used for visualizing energy consumption usage atnthe Client Mobile Application. Tonmake the development process faster and simpler, wenused Zeppelin notebooks for demos, visualizations and tests.

DataArt Case Study: Energy Consumption Prediction Model Using Spark MLib

Technology

Apache Spark

Spark SQL

Spark MLib

DeviceHive

Apache Cassandra

Scala

Impact

The solution developed by DataArt was based on an Open Source stack, the main goal of which was to demonstrate how industrial devices could be used as part of an IoT ecosystem. As a result we:

Designed a Cassandra Data model, which best suits all Client business requirements.
Helped with Spark and Cassandra cluster configuration and made changes to the architecture based on our BigData experience;
Developed anSpark Job for migrating binary Cassandra data to a human readable format;
Created linear regression model for predicting energy consumption and implemented calculation in a separate Spark Job (using Spark MLib)

Contáctanos

Por favor, deja tus datos y te contactaremos a la brevedad.

Nombre*

Apellido*

Correo electrónico*

Rol

Empresa*

Teléfono

¿Sobre qué te gustaría conversar?*

¿Cómo te enteraste de DataArt?

¿Cuál es tu función/rol?

Acepto recibir el boletín mensual de DataArt.

He leído y acepto la Política de Privacidad*