Integrate all data in a centralized Datamart
Updated: Apr 21, 2020
Once you have pulled data via API, the next step is to decide on a cloud solution where this data will be extracted and store. There are many cloud solutions providers viz. Microsoft (Azure), Amazon (AWS), Google Cloud etc. Here, in this article, we will be discussing how we can use Microsoft Azure cloud to implement a Data Mart.
Choosing Microsoft Azure as a cloud solution
Microsoft Azure offers a cloud computing service that was created by Microsoft for building, deploying, testing and managing applications and services. It provides a platform as a service (PaaS), software as a service (SaaS) and infrastructure as a service (IaaS) and supports many different programming languages, tools and frameworks, including both Microsoft-specific and third-party software and systems. Out of over 600 services, we will need the following components to build a basic analytics infrastructure:
Key Components to build a Data Mart
(a) Azure Data Factory – This is a data integration service that allows the creation of data-driven workflows in the cloud for orchestrating and automating data movement and data transformation
(b) SQL Database-This is a fully managed cloud data warehouse which will be used for creating additional business logic and aggregation logic etc.
(c) Azure Data Lake Store- This is scalable data storage and analytic service for big data analytics workloads that require developers to run massively parallel queries
Overall Cost of running this DataMart
Microsoft Azure provides billing details of each component and resource group separately. One can track and perform several optimizing strategies to reduce the monthly billings. Generally, the billing depends on a number of factors:
· Each Azure components have different billings. Depending on the criticality, one can optimize the usage of each of the modules.
· Amount of data processed.
· Loading strategy – Incremental load vs full load. We have discussed this in more details in the next sections.
· Frequency of refresh needed. For example, whether the refresh is needed on a daily basis//weekly basis/monthly basis etc.
On an overall basis, assuming that there are no more than 5 GB of Data and complexity of transformation is not high, it may cost $400-$800 /month for a particular Xero account. Assuming that you will be running multiple client accounts, a single DataMart can be created to unify the reporting and the cost can be further brought down on a per-account basis.