Ed Simmons: Hello, everyone. I'm Ed Simmons with Matchbook Advisors. We have an exciting panel today with DataArt about future-proofing your data strategy. I'd like to introduce my co-panelists. First, I'd like to introduce Alexey, who's been with DataArt since 2004. He started as a system architect and quickly moved up to manage large transformational projects. Alexey runs DataArt’s data and analytics platform practice, covering everything from solution design to regulatory compliance.
These days, Alexey is our go-to expert on bank digitization and finance. Andrey, one of our other panelists, has been with DataArt since 2005. What's great is that they were schoolmates, with Alexey starting the year before. Andrey runs the account management pre-sales for DataArt’s finance practice and has led numerous efforts in the asset management world.
He's worked with everything from liquid investments to private equity, alternative data, and crypto. Today, we'll discuss several key topics: data as a competitive advantage for asset managers; what data products and platforms are and why they matter; how to modernize legacy systems; and, of course, the impact of AI on data platforms. Please post questions in the comment section on LinkedIn.
Without further ado, let's get into it. When we talk about data as a competitive advantage, to me, it's always been: Can I be first to market with new products? Can I generate more alpha from our existing products? Or can I do things cheaper than my competitors and potentially gain a financial advantage that way? With increasing data and analytic complexity and with AI, it's getting harder and harder to deliver on these three things.
Andrey, can you talk about what your customers are doing?
Andrey Ivanov: Yes, data is definitely a competitive advantage for asset managers. From an engineering standpoint, asset managers are essentially gigantic data processing machines—they collect data about the market, about what participants are doing, about companies to invest in, fundamentals, supply chains, and the general state of the economy. Many of our clients have been in the space for 20, 30, even 40 years, so they have a lot of historical data. The decisions asset managers make are all data-driven. Being able to process data efficiently, expose it via self-service interfaces, and ensure high data quality is extremely important.
At the end of the day, every asset manager seeks alpha—beating the others and generating extra returns. Since everything is data-driven, they also seek operational efficiencies. The better your data stack, the easier it is to be operationally efficient and focus your employees’ attention and time—your most valuable asset—on generating alpha.
Ed Simmons: I think when we divide the world into what's context and what's core to making money, the more you can move data plumbing and similar issues into context, and keep value-added activities in your core, the better. A modern platform definitely helps with that.
Alexey, your thoughts?
Alexey Utkin: I agree. There are really two parts we see: enabling alpha generation through advanced models and having data to support all that. But what we actually see more often with our clients in asset and investment management is the impact of the investments they're making in modernizing their data platforms and architectures. This allows them to operate better at scale, process unstructured or large volumes of data, and dream of using AI for data as well as alpha generation, and to share data with counterparties. All these things are difficult unless you have strong foundations in both technology and engineering operations.
We've had a number of successful client engagements in which we helped them improve how they work with data, architect for data, and deal with data. This has been very impactful. Now, they can get to investment analysis, portfolio analysis, quant work, and so on, which previously wasn't possible because they were bogged down with data quality issues and integration challenges.
Ed Simmons: We've all been there with data quality and integration issues preventing people from working on the good stuff. When you look at what an asset manager does, it's basically developing products for clients, and these products are all based on data. Data is the lifeblood of the asset management business.
We've been hearing a lot about data products and platforms. Alexey, what is a data product or data platform, and why is this concept so talked about these days?
Alexey Utkin: From a data analytics industry perspective, data as a product became popular about four or five years ago with the appearance of data mesh, where it's one of the pillars. It focuses on the value your data delivers to consumers in a business context. You have to step into their world and understand what they're trying to do, what form they need data in, and at what latency.
Within the data-as-a-product concept, there's a lot about data standardization and treating data like a product on a shelf—you need to understand what it is, what's inside, whether it fits your purpose, and how to consume it. Instead of just dumping data on users, you provide rich context and self-service access. Governance is also embedded into specific products, driving value for specific users.
Ed Simmons: Andrey, isn't this what we've been doing all along with things like security master files? What's different now?
Andrey Ivanov: Exactly. Data products aren't a revolution or a shiny new thing nobody has tried before. They're an evolution of what asset managers have been doing for years. My experience is similar. One of the first things I did in asset management was the security master, which had a well-defined interface and standards for exposing securities. Clients would ask how the data is actually used, which fields are important, and about data quality.
That's a data product. Security Master was already a data product from 2005 to 2009. The new thinking is to generalize this: don't treat isolated islands of data as products, but consider all pieces of your infrastructure—order management, investment life cycle systems, accounting, analytics—as data products. When you track how people use the data and what's important for which processes, you can leverage your data infrastructure in powerful ways to push your business further and extract more alpha.
Ed Simmons: Would you say that being more accountable to your internal customers—treating them like external customers—is a key change organizations need to adopt?
Andrey Ivanov: Yes, absolutely. It's always been true that you can't supply the wrong data to customers, internal or external. But really looking at what those internal customers want, how they're using your data, what's important, and what's missing is key. That paves the way for full self-service data access, breaking down silos, and enabling open data systems.
Ed Simmons: Alexey, you mentioned a data mesh. What is that, and how does it relate to data products?
Alexey Utkin: Data mesh is where data products are one of the pillars. Another key aspect is decentralization—moving away from the idea that a central data team and platform can fit all organizational needs. Instead, you enable different domain-oriented teams, like those for portfolio management or risk management, to produce data products aligned with business functions. Data mesh solves this through a self-service data platform, which allows teams to produce and consume data products efficiently.
The last pillar is governance. To enable distribution and decentralization while maintaining self-service, you need to standardize and align things, like what constitutes a data product, data contracts, metadata, and interoperability. Data mesh brings together these pillars for a more efficient, distributed, and domain-oriented approach to high-value data products.
Ed Simmons: From an infrastructure standpoint, I see advantages to distributed infrastructure. But how about things you need to keep central, like metadata definitions? How do we ensure interoperability and governance if everyone is doing their own thing?
Alexey Utkin: It can still be rational to standardize some things, like your core data model or core data elements. But you don't have to do it for everything, because it's expensive and time-consuming. For the rest, it comes down to capturing rich context and metadata, and using emerging standards for data products and contracts. That way, you can regulate independent data products and assets and achieve interoperability. Ultimately, you have to judge whether it's worth integrating and standardizing a piece of data.
Andrey Ivanov: In the old days, centralization was preferred because you typically had one database and not much automation. Today, you have advanced automation at every level, can span environments, move data, and expose metadata. Automated checks and tasks make governance easier, and compliance can be automated, rather than relying on everyone agreeing manually.
Ed Simmons: So as long as you agree on how different units will interoperate, you can give local autonomy, like state versus federal government. Local control speeds time to market, while federal control can cut costs. What do we do with really old systems? Do we move them, keep them in place, or decide case by case?
Andrey Ivanov: This is a big part of our work. Many asset managers have data infrastructure built around early 2000s tech—SQL Server, stored procedures, Sybase, Oracle. Now, new cloud platforms are in demand, and many developers prefer modern tools like Python. There’s also demand for distributed, high-performance analytics that older platforms can’t provide.
Modernization needs people who understand the problem space. Legacy systems have often been developed over 20 years with little documentation. Someone who understands the business domain will have an easier time understanding how the system works. That’s one ingredient. Another is experience—experts who’ve done this many times can help a lot.
Alexey Utkin: From client experience, with large-scale migrations, it's important to assess and profile what you're dealing with. If you just do a lift-and-shift migration, you don’t get value, and often migrate a lot of unused assets. Often, 80% of dashboards, models, or data assets aren’t used. You need to profile what's actually valuable before planning migration.
AI can help with profiling, integration, and validation. It can help with data profiling, mapping, integration, and even validation of the target state. AI capabilities are increasingly useful for these projects.
Ed Simmons: Andrey, what about legacy systems you don’t want to modernize because they’re too expensive or they perform critical functions?
Andrey Ivanov: You don’t always have to rewrite or refactor everything. For example, accounting systems are very complex, and we typically don’t recommend rebuilding them. Instead, you can build a facade around them—let the system do its core function and build a REST API to expose it to modern parts of your data stack. Sometimes it can be containerized or mirrored into the cloud. There are different ways to implement a facade, but you don’t have to upgrade every part of the system—some can be left as black boxes with well-defined boundaries.
Ed Simmons: It's an interesting question—what to modernize and what not to. Often, legacy systems perform multiple functions, and while some are easy to modernize, others are not. This area is still as much art and financial modeling as science. Alexey, what about using AI to modernize legacy systems?
Alexey Utkin: Large language models and generative AI are being applied for translation (e.g., ETL logic), data modeling, and schema mapping between old and new systems. Many data tools now have AI capabilities. Snowflake, Databricks, and others are adding AI features. You can also use your own models or open-source models for these tasks.
Ed Simmons: When we start thinking about AI and data management, I see three areas: using AI to improve the data supply chain, using AI to create alpha, and using AI to deliver exactly what customers want. Let's start with the first—using AI to improve the supply chain. Andrey, what are your thoughts on quality and observability?
Andrey Ivanov: AI, especially neural networks, can help with data quality. For example, we’ve automated data quality checks, like flagging unusual price jumps, using AI. Neural networks can infer whether a signal is valid based on typical patterns. There’s a lot of potential for AI in data quality.
Ed Simmons: What about AI for building integrations, Alexey?
Alexey Utkin: Still, most data projects struggle to deliver value at the right price point. AI can change this equation by automating rudimentary and labor-intensive tasks, like integrating data, mapping formats, validating, and modeling. This work used to take months of data engineering; with AI, it’s much faster. AI also helps with populating data catalogs and semantics, making data more discoverable and usable.
Ed Simmons: You mentioned using AI to search your data. With big companies and complex data, using AI to help assemble data products is really interesting.
Alexey Utkin: Yes, AI can help search and discover data, even if it’s unstructured or lacks good descriptions. It can analyze structure, values, and usage patterns to infer meaning. For high-accuracy use cases, you may need to shape your data into richer formats like semantic layers or knowledge graphs to enable accurate, context-rich answers.
Andrey Ivanov: One interesting thing we’ve done with AI is building self-service APIs for data access. Large language models act as a universal translation layer: users can input questions in English, and the system translates them into SQL or Python queries. For example, we built a connector for Excel where users can type queries in plain English, and the system fetches data from the backend. This enables self-service in a familiar environment.
Ed Simmons: You quickly moved from using AI to build data to using AI for analytics. What are you seeing in terms of using AI for alpha generation, and what about explainability?
Andrey Ivanov: Clients in asset management are more conservative about deploying AI for alpha generation because the industry is heavily regulated. Firms need to prove that AI systems follow all regulatory rules. Right now, AI is mainly used for development, data sourcing, and quality checks. Once AI explainability advances, we may see more deployment for alpha generation.
Alexey Utkin: Even before generative AI, machine learning was used for financial analytics, and regulatory frameworks for model validation and governance existed. These have become stricter for AI. Companies often experiment with AI models alongside traditional risk models, using explainable models for critical decisions. Techniques exist to build explainable models with similar accuracy, which helps meet regulatory requirements.
Ed Simmons: The key is that when you move from unexplainable to explainable models, you have a big lineage problem. You need to track what you’re doing, and the ability to recreate data at a point in time becomes critical, as does model lifecycle management. Many firms aren’t ready for this yet; it’s a journey from both an alpha generation and a governance perspective.
Let’s sum up and take some questions. We talked about data as both an alpha generator and an efficiency driver for asset managers, touched on data products, platforms, and meshes, including facade patterns and techniques, and discussed AI as both an enabler of data and a potential alpha generator.
First audience question: With the increasing volume of alternative data sources, how should asset managers evaluate and integrate which ones are worth using?
Andrey Ivanov: Expertise is important. For example, we had a client interested in satellite feeds for land development tracking. Satellite feeds differ not just in price or resolution, but also in metadata. Some provide height data, which is crucial for tracking development projects. You need someone who understands these sources and can guide you. Cost is also a factor—some data sets are very large and expensive to store or ingest. Many alternative data sources are built for human consumption, not machines, so AI can help process semi-structured or unstructured data. Having a proper development, test, and production environment lets you evaluate new sources in a sandbox before going live.
Ed Simmons: Would you say a modern data platform is a big advantage for evaluating new data sources?
Andrey Ivanov: Absolutely. On the cloud with a modern data lake or lakehouse platform, you get agility and speed. You can ingest and analyze data faster, with more control and quicker turnaround on issues.
Ed Simmons: Next question: When you speak about data products and decentralization by teams, are you talking about operations inside the firewall, or do clients expand this outside the firewall for strategic alliances?
Inside the firewall, decentralization enables domain-oriented or product teams to work on data aligned with business functions, and can help when centralizing legacy infrastructure is too slow or complex. But the most exciting part is collaboration beyond the firewall—sharing data with counterparties, using platforms like Snowflake for data sharing, or creating data marketplaces for internal or external use. Some asset managers spin off data product businesses to share with peers.
Andrey Ivanov: In the old days, integration across organizations was mostly files on FTP, which isn’t real integration. Now, platforms like Snowflake are champions in data sharing, and we see clients using Snowflake as a data delivery mechanism for administrators and others.
Alexey Utkin: Yes, Snowflake is very popular in asset management, and its sharing and marketplace capabilities are well established. Databricks and others are catching up, but Snowflake is the leader in this space.
Ed Simmons: Another question: What’s the hardest part about migration, and what AI tools can help?
Alexey Utkin: For me, the hardest part is planning and validation. With smart planning, you can achieve migration with much less effort and more benefits. AI tools from Snowflake, Fivetran, cloud providers, and open-source models can help. But planning and validation are key.
Andrey Ivanov: Migration usually consists of data entry and validation, analytics, and delivery. We often modernize the front end first, but data entry and validation are the hardest, especially with complex, organically grown logic. For AI, tools like Claude are popular for understanding and reverse engineering code.
Ed Simmons: Last question: Lineage is a real issue if data is used for decision automation. There are platforms to track lineage, but building all this infrastructure is a heavy lift. Thoughts?
Alexey Utkin: I agree, it’s a heavy lift. Technologies enable a lot of this—tracking lineage, temporal querying, capturing metadata—but in reality, most organizations aren’t there yet. It’s a matter of deciding how much to invest in lineage and governance as part of your AI enablement strategy.
Ed Simmons: I think that’s all we have time for. Thank you to both panelists and the audience. Please contact us at sales@dataart.com with questions or ideas for future webinars. Thank you, everyone.