
The system operates on an event-driven architecture, automatically triggering processes in response to external events, such as file uploads to designated Amazon S3 buckets via SFTP clients. Once a file is uploaded, AWS Lambda functions initiate processing, executing workflows through AWS Batch or Step Functions.
Key Components of the Solution
The system's architecture is organized into several layers:
- ELT Layer: Leverages AWS Step Functions, Lambda, and Batch for efficient data processing and transformation.
- Internal Data Layer: Utilizes Snowflake, Amazon Aurora PostgreSQL, and Amazon S3 for optimized data storage and analytics.
- Service Layer: Runs microservices on Amazon ECS and AWS Fargate, allowing smooth API interactions. An Application Load Balancer optimizes traffic distribution, while Amazon CloudFront enhances content delivery.
- Client Layer: Provides a user-friendly interface built with JavaScript and React.
- Monitoring Layer: Uses Amazon CloudWatch for real-time system monitoring and logging.
- External Data Layer: Enables secure data transfers via SFTP and Amazon S3.
- Code and Infrastructure Management: Streamlines deployment with GitHub, GitHub Actions, Docker, and Terraform.
Outcomes
Leveraging AWS services, the platform enhances scalability, simplifies operations, and ensures real-time accuracy in rights data management.
Key achievements include:
- Automated metadata ingestion: Supports XLSX, CSV, DDEX, CWR, and BWARM file formats, integrating data into Snowflake.
- Advanced data modeling: Maintains confidentiality while enabling dataset validation, correction, and cross-stakeholder matching.
- Asset metadata processing: Supports distinct yet interlinked entities, allowing analysis of multiple versions across datasets.
- Optimized data flow: Automates the entire process from SFTP-based data ingestion to generating enriched output files with suggested matches.
- User-friendly UI: Allows users to process and manage data without direct interaction with AWS services.
Business Benefits
- Increased Efficiency: Automated workflows minimize manual effort, streamlining metadata management and royalty processing.
- Improved Accuracy: The data model enables better validation and correction, keeping rights data precise and up to date.
- Scalability and Flexibility: Cloud-based infrastructure ensures the platform scales dynamically to accommodate growing data volumes and evolving industry needs.
- Enhanced Security: The solution maintains strict data segregation, protecting confidentiality and enabling seamless collaboration.
- Cost Optimization: A Total Cost of Ownership (TCO) analysis confirmed that the AWS-based infrastructure helped reduce the TCO by 12% while maintaining high performance.
Technology
Python, FastAPI, AWS Boto3, Pytest, SQLAlchemy, Snowflake Python Connector, Psycopg2, Selenium, BeautifulSoup, Pandas, NumPy
JavaScript, ReactJS, Redux, HTML5, CSS3, Vite
Amazon Aurora PostgreSQL, Snowflake, dbt, Amazon S3
AWS Step Functions, Lambda, Batch, Fargate, CloudTrail, Amazon CloudWatch
Terraform, GitHub, GitHub Actions
