I was recently sent a collection of analytical business questions by a customer. I can’t share the details of them of course, but I was surprised by the level of sophistication in the questions – they included product, time, demographic and geospatial dimensions.
The analytics team was trying to get answers to these questions for their executive team and they were encountering time-outs trying to get the data out of their existing data warehouse. The vendor will go unnamed but it is a traditional disk-based EDW platform.
The analytics team had heard we had built a SAP HANA system, which included some of the elements around which they were asking questions, and were interested to see if we could help answer them using SAP HANA.
In the meantime, I was reading Holger Mueller‘s interesting blog on Why is analytics so hard? Or: The holy grail. It’s an interesting read, come back here once you’ve read it. And following this, I had a bunch of thoughts:
What did we build in HANA?
Again I can’t share the specifics, but I can tell you what we did, generally. We loaded Point of Sale data (not a retail customer) and a collection of master data like Customer, Supplier, and also a bunch of reference data like Geospatial, Demographic.
What’s really cool about doing this in HANA is that we can use as many CPUs as we have available to process a question, and we have 160 cores in my test box. This means that we can answer a question like “Tell me about the average sale price by customer, and slice it by the age of the customer at time of transaction, and group it by political leaning”. And HANA returns the answer in 2-3 seconds. Any question you like. Keep asking questions.
Now we used both SAP’s Lumira tool and Tableau on top of this to do visualizations. Lumira is a little quicker, Tableau has a lot more features. And I came to some conclusions:
1) HANA brings structured analytics into Holger’s innovation “Phase 3 – A business user can – with appropriate, but affordable training – use the innovation”
You have to be able to formulate questions, and use a tool like Tableau/Lumira. I learnt how to use Tableau in a few hours with no training, and Lumira in a few minutes. But, you have to know what questions you want to ask, and they have to have meaning.
More specifically, we could answer some of the questions using the model we built, and he wouldn’t have timeout problems.
2) Data quality problems always lurk under the surface
It won’t surprise anyone who has worked in data warehousing, but data is a big challenge. It’s not possible to easily answer some of his questions because their category hierarchy doesn’t allow for it. They have categories A-E and his question wants to know about category F. Category F has business meaning but their system hasn’t been updated to know that category F exists.
This requires an update to how they process master data and assign categories to transactions. We can actually do this really easily in HANA. For instance, we could use publicly available reference data like SIC codes to process this and then reprocess the transactions. Because we never need to aggregate with HANA, we only have to do this once and we’re done.
3) The structured data we built so far is not enough
There are data elements that we didn’t include in the initial model which means that some of the questions being asked can’t be answered yet. But also, some of the business questions are sophisticated and based on the latest trends, so the model hasn’t evolved.
We can add this stuff into the structured model easily enough, or with Tableau you can join a HANA model with a Tableau model so you can load that stuff into your own Tableau software and then do analysis. But suddenly we’re in Innovation “Phase 2 – Through tools more trained professionals in the relevant technology can make the innovation happen”.
And with HANA, we have to be careful with our data model if we want sub-second performance on billions of records, which may push us into Innovation “Phase 1 – Only experts can apply the technique to make the innovation happen” if the structure change is substantial.
Conclusion – the Layered Structured Architecture
And this is where my mind is headed: we have to classify how we want information to be available and what sorts of extensions to the analytics model different types of people can do.
For example, Phase 3 users will quickly find they can do quite advanced analytics. For example with Tableau you can easily join against Outcode to do geospatial analysis. And it performs great when joined against customer data in HANA.
But if you want to include new transaction categories (data) then you’re in Phase 1, which means you need a process to regularly update your structured model to include the new things people are thinking about.
Good business people will keep asking for more, harder questions. I think with HANA we have a platform which facilitates this, rather than handcuffing us.