Create strategies to pull data on a regular basis
Updated: Apr 21, 2020
After identifying the data sources, we next have to learn the mechanism to retrieve data from these sources. Data that is being generated on a daily basis makes it hard to track all the hidden insight. Let’s look at the steps to extract these data.
Usually, if the source system is on the cloud, you’ll find a set of documentation called as API specification which provides details on how to retrieve data, limitations etc.
Connecting Accounting Data Source (Xero Accounting) via API
Let’s take a sample example of how you can retrieve data from Xero Accounting. This exercise of connecting Xero needs a bit of technical expertise but you can still most of the part by reading the API documentation from Xero.
Understanding the Data Model of Data Sources
A Data Source usually consists of a large number of underlying Data Tables. This is because a transactional system has an underlying relational data model and that data model is designed to be great at getting data into a database in a way that maintains integrity and stores every piece of non-key data.
Let’s say you want to store data about customer purchases. You would probably like to store information viz. the dollar amount purchased, the customer name, Customer age, Customer location and the store location where the purchase took place, whether customer paid by cash or credit card etc. You could store all of these data points in a single table, but that would lead to problems. For example, a customer who has made hundreds of purchases may shift his location to another city or country. To reflect that change in the records, you would have to go through the entire table and change her location for every single purchase she’s ever made. Instead of dealing with hundreds of changes, a normalized data model would allow making only one change. In this situation, you should store customer data in a customer table, product data in a product table, store data in a store table, and so on. By using primary keys and foreign keys, you can link these tables together and access all the information you need.
There are typically over 50-100 Data Tables (typically known as endpoints) within each source systems. As an example, within Xero Accounting API, there are a lot of endpoints viz. Accounts, Bank Transactions, Credit Notes, Employees, Invoices, Items, Journals, Manual Journals, Organization etc. Further, in some endpoints groups has information for Balance Sheet, Aged Payables, Receivables, Profit & Loss, Trial Balance etc.
As you can see, the data when you download from Xero in .csv or .xlsx is more a summarized and aggregated data whereas the data from API is spread out in different tables and we need to choose the tables which have the most detailed granular information for our reporting purposes.
Let’s take an example of getting Profit & loss data: If you are looking for Revenue, COGS, SG&A sliced by various product types, Departments etc., then in the API call of Profit & Loss, we will need to get departments, divisions details via tracking categories as shown below.
This approach reflects that this is not just a developer’s job to simply pull data but one has to understand the business need and pull relevant parameters while building the DataMart to ensure that adequate data granularity is there for FP&A analytics reporting.
There are many other aspects of the API pull as there are hierarchies within a table. Customized Service wrappers for efficient mapping between the API Response and Data Models.