BigQuery

Stream form answers into Google BigQuery.

The Google BigQuery integration uses the BigQuery streaming API to insert answers into a BigQuery table, as they are received.

BiqQuery billing

The BigQuery streaming API is not available in the Google Cloud's free tier. See pricing.

Authentication

BigQuery is authenticated using a service account. Once a service account is created, export the credentials as a keyfile, and upload the keyfile to Formsort.

Note that the service account needs the minimum permissions necessary to have the ability to write data into the user-created table, such as the bigquery.dataEditor predefined role.

Frequency

The frequency of inserting a row to BigQuery can be configured as:

On Finalize: only at the end of the flow.
On Savepoint: after each step marked as save point, and at the end of the flow.
Every step: at the end of each step (when the responder advances using the Next button), and at the end of the flow.
Debounced: when the responder abandons the flow after a period of inactivity, and at the end of the flow. Formsort recommends using this setting to reduce the load.

Security

Formsort backend system will exclusively connect to BigQuery from the static IP 18.217.92.196.

Table schema

We take your security seriously, and require only the minimum permissions necessary to write to your BQ table (see Authentication). Because of this, you must create the table to which Formsort can write - Formsort will not create the table automatically.

Since the schema of a form flow can change between different deployments, as new questions are added and removed, the schema for writing to a BigQuery table has a repeated answers field that stores the answer values received.

submitted_at

TIMESTAMP

REQUIRED

UTC timestamp of when the answers were received by Formsort.

responder_uuid

STRING

REQUIRED

The responder uuid.

flow_label

STRING

REQUIRED

The flow label.

variant_label

STRING

REQUIRED

The variant label.

variant_revision_uuid

STRING

REQUIRED

The variant revision UUID under which these answers were collected.

event_type

STRING

REQUIRED

Which analytics event this payload corresponds to.

This is currently either StepCompleted, or FlowFinalized.

answers

RECORD

REPEATED

Fields:

key
- STRING, REQUIRED
- The answer variable label
value
- STRING, REQUIRED
- The value of the answer, as a string
type
- STRING, NUMERIC, or BOOLEAN

schema_version

NUMERIC

REQUIRED

The version of the BigQuery adapter in use.

Currently at 2

Schema definition in JSON

To create the table within the BigQuery console, we recommend you copy-paste the following JSON schema using the Edit as text toggle:

[
    {
        "name": "submitted_at",
        "type": "TIMESTAMP",
        "mode": "REQUIRED"
    },
    {
        "name": "responder_uuid",
        "type": "STRING",
        "mode": "REQUIRED"
    },
    {
        "name": "flow_label",
        "type": "STRING",
        "mode": "REQUIRED"
    },
    {
        "name": "variant_label",
        "type": "STRING",
        "mode": "REQUIRED"
    },
    {
        "name": "variant_revision_uuid",
        "type": "STRING",
        "mode": "REQUIRED"
    },
    {
        "name": "event_type",
        "type": "STRING",
        "mode": "REQUIRED"
    },
    {
        "name": "answers",
        "type": "RECORD",
        "mode": "REPEATED",
        "fields": [
            {
                "name": "key",
                "type": "STRING",
                "mode": "REQUIRED"
            },
            {
                "name": "value",
                "type": "STRING",
                "mode": "REQUIRED"
            },
            {
                "name": "type",
                "type": "STRING",
                "mode": "REQUIRED"
            }
        ]
    },
    {
        "name": "schema_version",
        "type": "NUMERIC",
        "mode": "REQUIRED"
    }
]

Schema definition in python

from google.cloud import bigquery

schema = [
    bigquery.SchemaField("submitted_at", "TIMESTAMP", mode="REQUIRED"),
    bigquery.SchemaField("responder_uuid", "STRING", mode="REQUIRED"),
    bigquery.SchemaField("flow_label", "STRING", mode="REQUIRED"),
    bigquery.SchemaField("variant_label", "STRING", mode="REQUIRED"),
    bigquery.SchemaField("variant_revision_uuid", "STRING", mode="REQUIRED"),
    bigquery.SchemaField("event_type", "STRING", mode="REQUIRED"),
    bigquery.SchemaField(
        "answers",
        "RECORD",
        mode="REPEATED",
        fields=[
            bigquery.SchemaField("key", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("value", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("type", "STRING", mode="REQUIRED"),
        ],
    ),
    bigquery.SchemaField("schema_version", "NUMERIC", mode="REQUIRED"),
]

Adding multiple BigQuery instances

There is an option to send answer payloads to multiple BigQuery destinations. This is useful if you'd like to share your data across multiple endpoints, or have payloads that are sent at different submission frequencies go to different destinations.

To enable multiple destinations, first complete setup for an initial destination (Destination 1). Then, click the "+ Add destination" button. Up to 3 instances can be added.

Once you've finished configuring all instances of your BigQuery integrations, be sure to Save your work with the button in the top right corner.

If you have deployed flows previous to integrating with or updating your BigQuery integration, it is advisable to re-deploy those flows.

PreviousAmplitude cross domain tracking NextFullStory

Last updated 5 months ago

Was this helpful?