What I’ve learned from my first play around with BigQuery/Sheets integration

So this just happened. BigQuery integration with Sheets!. This is great; now you can use a spreadsheet as a federated datasource for BigQuery, and therefore query (and join with other massive datasets) your spreadsheet data in real time.

It’s good, but it’s not quite all that yet, so here’s a few things I found out that might save you some time as you start to enjoy this awesomeness. I assume that it’s not fully released yet, and these things will be sorted out soon.

It’s a CSV

The blogpost shows this as the way to get your Sheet connected to bigQuery.

Actually, the Google Sheets doesn’t exist as an option, so the only selection that works is CSV.

The url

The ? in the BigQuery UI says the format is

You can do that, but it’s probably more convenient to use the url from the sheets editor.

https://docs.google.com/spreadsheets/d/1HLrLO2xnbk_1vezB_kmPvDFv1oX50HcD1P1lQfv7KKY/edit#gid=0

 

Selecting the sheet

The gid= parameter is used to identify which sheet to use, but this is apparently ignored. The first sheet is used regardless of the gid specified. So this means that only the first sheet in the workbook is able to be BigQuerified.

The schema

You’d think that the sheet would be able to automatically figure out its own schema and data type, using the heading row of the sheet, but it doesn’t. You have to specify the schema manually – entering each field. It’s a good idea to go into ‘edit as text’ once you’ve set the schema up and save the definition somewhere – which will look something like this, since you can’t seem to be able to edit the setting (or even get to this option) once the table has been created.

ipnumber:STRING,name:STRING

Skipping the header row

Since this is being treated as a CSV, the header row is nothing special – and you need to tell it to skip it – since you’ve set up the schema manually anyway, otherwise the header row will end up as part of your data.

Changing the sheet structure

If you ,say, add a column, after creating the schema

that’s to say my schema says this

but I’ve since added a new column in my sheet

This query

SELECT * FROM [airports.bb]

returns the data based on the original schema

which means that the schema is position based and not in any way dynamic. It’s not surprising since the heading row is not used anyway, but it’s certainly something to look out for.

Summary

All this means that what we are able to do today doesn’t seem to exactly match the expectations raised by the released blogpost – Hopefully when we see the version with the Google Sheets input option – all this will be taken care of.

 

Author: brucemcp

Leave a Reply

Your email address will not be published. Required fields are marked *