Load CSV data into Big Query via GCS bucket + cloud functions using cloud shell

$ head -3 us-500.csvfirst_name last_name company_name address city county state zip James Butt Benton, John B Jr 6649 N Blue Gum St New Orleans Orleans LA 70116 Josephine Darakjy Chaney, Jeffrey A Esq 4 B Blue Ridge Blvd Brighton Livingston MI 48116 Art Venere Chemel, James L Cpa 8 W Cerritos Ave #54 Bridgeport Gloucester NJ 8014
$ gsutil mb gs://ustestbucket
$ bq mk --dataset my-first-project:ustestdataset
$ bq mk --table ustestdataset.uscsvtable first_name:STRING,last_name:STRING,company_name:STRING,address:STRING,city:STRING,county:STRING,state:STRING,zip:INTEGER
import os
from google.cloud import bigquery
def csv_loader(data, context):
client = bigquery.Client()
dataset_id = os.environ['DATASET']
dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
bigquery.SchemaField('first_name', 'STRING'),
bigquery.SchemaField('last_name', 'STRING'),
bigquery.SchemaField('company_name', 'STRING'),
bigquery.SchemaField('address', 'STRING'),
bigquery.SchemaField('city', 'STRING'),
bigquery.SchemaField('county', 'STRING'),
bigquery.SchemaField('state', 'STRING'),
bigquery.SchemaField('zip', 'INTEGER')
]
job_config.skip_leading_rows = 1
job_config.source_format = bigquery.SourceFormat.CSV
# get the URI for uploaded CSV in GCS from 'data'
uri = 'gs://' + os.environ['BUCKET'] + '/' + data['name']
# lets do this
load_job = client.load_table_from_uri(
uri,
dataset_ref.table(os.environ['TABLE']),
job_config=job_config)
print('Starting job {}'.format(load_job.job_id))
print('Function=csv_loader, Version=' + os.environ['VERSION'])
print('File: {}'.format(data['name']))
load_job.result() # wait for table load to complete.
print('Job finished.')
destination_table = client.get_table(dataset_ref.table(os.environ['TABLE']))
print('Loaded {} rows.'.format(destination_table.num_rows))
environment.yaml
BUCKET: ustestbucket
DATASET: ustestdataset
TABLE: uscsvtable
VERSION: v14
requirements.txt
google-cloud
google-cloud-bigquery
.gcloudignore
*csv
*yaml
$ lsenvironment.yaml main.py requirements.txt us-500.csv
$ gcloud beta functions deploy us_csv_loader — runtime=python37 — trigger-resource=gs://ustestbucket — trigger-event=google.storage.object.finalize — entry-point=csv_loader — env-vars-file=environment.yaml
$ gsutil cp ustestdata.csv gs://ustestbucket/Copying file://ustestdata.csv [Content-Type=text/csv]…- [1 files][ 60.4 KiB/ 60.4 KiB]Operation completed over 1 objects/60.4 KiB.
$ gcloud functions logs read
$ bq query --nouse_legacy_sql 'SELECT COUNT(*) FROM `my-first-project.ustestdataset.uscsvtable'

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store