BigQuery Comes up Big

AWS boasts over 150 cloud services, Google clocks in with at least 100. We understand in depth EC2, Route53, Compute Engine, and several others. One question gnaws at us: which services are we missing?
There simply is not enough time to gain expertise in all cloud services in order to evaluate their applicability to our particular solution and business needs.
Fortunately, sometimes the answer serendipitously falls into our lap. Here is one such story.
To start, as you may know, our products automate snapshot replication of EC2 instances or GCP virtual machines across regions for disaster recovery protection, and are available as SaaS solutions on AWS and GCP Marketplace. Minimizing overall cost for our users underpins our approach. One feature we offer is visibility into the ongoing replication fee that each EC2 instance or GCP VM is accruing to help users identify applications that are hogging cloud spend.
Our product on AWS, if granted the necessary permissions, downloads the user’s AWS Cost Explorer report on a daily basis and searches for data replication line items in the CSV file, such as:
Tags our product assigns to each snapshot identify the instance ID of the instance that is snapshotted. A script sums the costs from the last 30 days and presents in them in the UI. The process is somewhat robust, but the lack of a formal schema in the CSV file and the need for the user to allow tags in the cost report potentially expose this code to breakage.
In porting this feature to our Google Cloud offering, we stumbled upon the BigQuery service, a database warehouse that originally we did not deem as necessary to investigate. However, GCP can export daily usage reports to BigQuery, and you can use standard SQL queries on a published database schema to robustly extract the necessary data to calculate snapshot replication costs. What’s more, the cost information for each line item includes all of the generic labels assigned to any object. Because of this, our product writes a label to each snapshot identifying the Compute Engine virtual machine from which the snapshot was taken. As a result we can obtain cost information as well as map each snapshot to each application precisely and robustly from a single standard SQL query.
Here’s the blow-by-blow of how a GCP service, previously unknown to us, quickly and efficiently entered the fold of critical infrastructure to our business:
First, in the Billing section of Google Cloud console the cost report is exported to BigQuery:
Second, our SaaS product runs as a VM associated with an IAM Service Account with the minimum permissions to automate disaster recovery automation, so that credentials are not stored on in the SaaS instance itself. This Service Account merely needs read-access to the usage dataset:
Next, to obtain data replication costs we generate a query searching for “PD snapshot egress” from the region hosting production applications, for a certain time frame, say, the last 7 days:
The result is a table of data entries that can be exported as JSON format consumed by a script. Each entry has the complete information: the amount of data replicated, the cost, and the contents of the label our product assigned to the snapshot identifying the VM to which it is associated.
As a result, each line in our product UI that identifies the status of each application can break down the individual replication cost, and outliers can be identified. This feature will be available in our next release this winter.
Like many other cloud services, BigQuery required only nominal setup; the learning curve proved not too steep as documentation both from Google and third parties is readily available; and the actual monetary cost to experiment with the service turned out to be minimal.
As a result we are able to plug a feature gap in our Google offering using a robust, secure solution, even with a relatively small use of BigQuery. An important dividend, however, is that we have unexpectedly gained BigQuery expertise. I don’t know if this service will be necessary in the future, but it is a nice addition to our toolbox. With the Cloud, sometimes you don’t have to search for solutions and instead let solutions come to you.