abstractable.io logo abstractable.io

Perth Solutioniser School: Data Edition

Patterns for Value from Data

Cristian Southall

We’ve justed wrapped up another Perth Solutioniser School Meetup. The objective of tonight’s ‘Data Edition’ was to discuss key architecture patterns for realising value from enterprise data. Favouring breadth over depth, we covered these patterns as best as possible within the limited time. Thanks to Van Zyl Kruger for presenting and sharing his insights on implementing these patterns in enterprises from a variety of industries. Thanks also to the attendees for their participation. We doubled attendance from the last session!

Given that Van Zyl’s presentation was periodically interrupted (to the benefit of the session) by a steady flow of questions, it is difficult to give a structured account of proceedings. However, points of note include:

Data Lake

  • We discussed “shifting right”. That is, using the Data Lake pattern to shift the work of data preparation and reporting closer to end-users rather than incurring this effort in the population of Data Warehouses/Marts and pre-preparing reports that often fail to anticipate the needs of end-users.
  • Van Zyl agreed this strategy can deliver real benefit with the caveat that it typically only works for expert, data-savvy users with the skills to parse and combine their own data. Generalist business users that lack these skills still require data to be prepared for them (be it in reports or alternative analytical tools).
  • An interesting correlation between the Data Lake pattern and the Operational Data Store pattern was raised (notwithstanding the enhanced capability of modern Data Lake technologies to ingest unstructured data).

Data Warehouses/Data Marts

  • Some insights were shared on how increasing enterprise data volumes drive adoption of the various patterns. Specifically, Van Zyl noted that Data Warehouses/Marts require periodic rebuilding to account for infrequent but profound changes to certain data (e.g. enumerations, typologies). At a certain volume of data, these rebuilds become infeasible due to processing lead times or processing cost.
  • It was queried whether Data Lakes should standardise on a single data storage format. The advice was to leverage the full array of available storage types as needed to suit the data at hand.
  • With the caveat that success might be measured a number of ways, I noted a long-running joke among certain Architects that we had never seen a successful Data Warehouse implementation. When asked how true this might be, Van Zyl conceded that whilst he had seen successful implementations he couldn’t reasonably point to a highly-successful implementation. He noted that ‘self-service’ has been the biggest failure for Data Warehouses and that increased focus on enterprise agility is making the task of creating and sustaining Data Warehouses more challenging. He noted that the key measure of success is whether a Data Warehouse remains in broad, active use over the longer term. He further noted that he has built a number of solutions that meet this success criterion but that they might not be understood as Data Warehouses per se (but instead as some more generic ‘decision support system’).

LakeHouse

  • A relatively brief discussion was had on the emerging LakeHouse pattern, where a Data Warehouse is provided direct access to the Data Lake rather than being fed from the Data Lake via ETL. It was noted that this pattern is likely to gain popularity based on the significant amounts of data integration and middleware it eliminates from the enterprise landscape but that the technology remains immature.

Data Vault

  • Whilst there is a strong rationale for this pattern, it exhibits similar characteristics to the Data Warehouse. Specifically, implementations are coupled tightly to the data and data access patterns, and thus require deep understanding of user requirements. It is also dependent on heavyweight data integration.
  • Van Zyl pointed to a number of implementations in Australia, noting that the pattern was particularly popular 5-10 years ago and in the insurance industry. Not many of these are still running successfully today.
  • The pattern is particularly suited to wide data sets, not deep ones.

Powered by Jekyll; inspired by jekyll-swiss.