About the Client
GuestMetrics LLC delivers cloud-based analytics and reporting solutions to the food & beverage, hospitality, and financial services industries. The analytics platform transforms collected point-of-sale transactions into easy-to-understand market data.
More information: http://www.guestmetrics.com
The existing solution was based on an obsolete diversified technology stack, which led to unjustified increases in operating costs and complex maintenance due to the variety of technologies used and lack of documentation. The existing platform had a fragile architecture and legacy hardware dependency, which caused additional issues in maintaining an already expensive data center and licences.
The legacy platform failed to meet the high performance demands and data processing was slow. The POS transaction pipeline took up to two days to process one month of data.
The client was looking to transform the existing platform to minimize the overall operating costs, move the infrastructure to the cloud rather than maintain a cost-ineffective data center, and optimize the pipeline performance to process historical data in just a few hours.
Meeting the Challenge
The overall transformation was affected by a lack of documentation, resources, and people who could answer questions. Essentially, the entire solution upgrade was treated as a reverse engineering exercise, with DataArt’s team required to decipher the source code, the details of implementation and configuration, and the deployment model, all without any technical insight.
The following approach was used to resolve the issues:
- Cloud enablement and devops streamlining. Data processing, storage, and all services were migrated to the AWS (Amazon Web Services) environment. Data processing was done by deploying the solution into a DC/OS cluster based on AWS resources. All deployable components are now delivered in Docker containers to ship them to a newly created GuestMetrics Ops team and effect the deployment process more easily.
- Data management strategy. Instead of maintaining a Hive cluster-based solution using a mix of Java, Bash, SQL, HDFS, and Hive, the entire solution was converted into a portable data processing pipeline based on Apache Spark. This made it possible to eliminate dependencies and, more importantly, run the data pipeline continuously every six hours instead of once a month. As a result, the data pipeline met the client’s platform requirements.
- Dashboard and user management solution. The legacy platform was transformed using R and a Shiny server for visualizing data. The development team decided to keep the platform as a subcomponent of a more scalable and easily extendable solution due to the complex and time-costly migration of the Dashboard system. This made it possible to enable SSO integration and customization based on individual client needs.
The DataArt team transformed the Pipeline into a new, fast, cost-effective, and optimized technological solution capable of quickly processing transactions. The solution was migrated to AWS, thus eliminating the hardware dependency and minimizing maintenance costs. Supplementary scalability was achieved using docker containerization.
The earlier, slow data processing system prevented the Client from developing their business and from providing their customers with data promptly.
The new and more effective data processing pipeline helped the Client achieve their business goals and successfully meet customer requirements.
The high level component architecture of the data pipeline:
DataArt performed in-depth research and provided full comprehensive documentation for each step and component of the transformed solution.