All resources

Mastering JSON Parsing and Conversion in BigQuery

Data rarely comes neatly structured - it often arrives in layers, requiring careful extraction to unlock its full potential. JSON in BigQuery simplifies this challenge, offering a flexible way to store, parse, and analyze complex datasets.

i-radius

In this article, we’ll explore the key JSON functions in BigQuery, demonstrating how they simplify the process of parsing, extracting, and converting JSON data. From basic functions to advanced use cases, we’ll guide you through practical examples that will empower you to harness the full potential of JSON in your data analysis. 

What Is JSON in BigQuery?

In BigQuery, JSON is a flexible, semi-structured data format that allows for schema-on-read processing. 

Unlike the fixed schema required by STRUCT types, JSON doesn’t impose a predefined structure, enabling applications to ingest and query data based on assumptions. 

This flexibility is ideal for handling nested or evolving data, making it highly suitable for modern data pipelines.

Why Is JSON Important in BigQuery?

JSON is crucial for handling complex and semi-structured data in BigQuery. Here’s why JSON is important in BigQuery:

  • Flexibility in Handling Semi-Structured Data: JSON allows for dynamic, flexible data storage without a fixed schema, making it ideal for managing semi-structured data that doesn't fit neatly into relational tables.
  • Integration with APIs: JSON is a widely used data format for API communication, enabling seamless integration between BigQuery and various third-party services or systems.
  • Compatibility with Modern Data Pipelines: JSON is often used in data pipelines for transmitting and transforming data, making it an essential format in modern ETL (Extract, Transform, Load) processes.
  • Support for Complex Data Structures: BigQuery can efficiently store and query nested JSON objects and arrays, allowing for complex data relationships and hierarchies to be represented and queried directly.
  • Simplified Data Processing: JSON's flexible structure reduces the need for rigid schema definitions, allowing for faster changes to data models as business requirements evolve.
  • Efficiency in ETL Processes: With JSON operators available, the need for a separate transformation (T) stage in ETL processes can be minimized, enabling a more streamlined flow of data through the pipeline.

Overview of JSON Functions Used for Parsing and Conversion in BigQuery

BigQuery offers several functions designed for parsing and converting JSON data. These functions can be categorized into legacy extractors, standard extractors, lax converters, and other converters, making it easier to manipulate nested data types.

Legacy Extractors

Legacy extractors in BigQuery are older functions used to extract data from JSON-formatted strings. These functions include JSON_EXTRACT, JSON_EXTRACT_ARRAY, JSON_EXTRACT_SCALAR, and others. 

Note: BigQuery is deprecating these functions, but they still work. Google recommends using JSON_Value.

JSON_EXTRACT

JSON_EXTRACT is a legacy function that extracts JSON objects or values from a JSON-formatted string based on a specified JSON path. However, BigQuery now recommends using JSON_QUERY for more robust and flexible querying.

Syntax:

The syntax for JSON_EXTRACT is as follows:

1JSON_EXTRACT(json_string_expr, json_path)

This function extracts a JSON object or value from the provided json_string_expr using the specified json_path.

JSON_EXTRACT_ARRAY

JSON_EXTRACT_ARRAY is a legacy function used to extract JSON arrays from a JSON-formatted string. It has a more modern alternative, JSON_QUERY_ARRAY, which offers better flexibility and performance for handling JSON arrays.

Syntax:

The syntax for JSON_EXTRACT_ARRAY is as follows:

1JSON_EXTRACT_ARRAY(json_string_expr, json_path)

This function extracts a JSON array from the provided json_string_expr based on the specified json_path.

JSON_EXTRACT_SCALAR

JSON_EXTRACT_SCALAR is a legacy function that extracts scalar values from a JSON-formatted string. It is now considered less efficient compared to the modern alternative, JSON_VALUE, which is recommended for extracting scalar values in BigQuery.

Syntax:

The syntax for JSON_EXTRACT_SCALAR is as follows:

1JSON_EXTRACT_SCALAR(json_string_expr, json_path)

This function extracts a scalar value from the provided json_string_expr using the specified json_path.

JSON_EXTRACT_STRING_ARRAY

JSON_EXTRACT_STRING_ARRAY is a legacy function used to extract string arrays from a JSON-formatted string. It has a more modern alternative, JSON_VALUE_ARRAY, which is recommended for extracting string arrays more efficiently in BigQuery.

Syntax:

The syntax for JSON_EXTRACT_STRING_ARRAY is as follows:

1JSON_EXTRACT_STRING_ARRAY(json_string_expr, json_path)

This function extracts a string array from the provided json_string_expr based on the specified json_path.

Standard Extractors

Standard extractors in BigQuery are functions designed to efficiently extract JSON objects, arrays, and scalar values from JSON-formatted strings. These functions, such as JSON_QUERY, JSON_QUERY_ARRAY, JSON_VALUE, and JSON_VALUE_ARRAY, offer more flexibility and performance compared to legacy functions

JSON_QUERY

JSON_QUERY is a standard function in BigQuery used to extract JSON objects or arrays from a JSON-formatted string. This function allows you to extract complex JSON structures without altering their format, which is essential for handling nested or hierarchical data.

Syntax:

The syntax for JSON_QUERY is simple and requires a JSON string and a JSON path to extract the desired data:

1JSON_QUERY(json_string_expr, json_path)

This function extracts the JSON object or array from the provided json_string_expr based on the specified json_path.

Example:

In this example, we extract the order object, which contains details of two products, from a JSON string

This demonstrates how JSON_QUERY can be used to retrieve a specific object from a nested JSON structure:

1SELECT
2  JSON_QUERY(
3    JSON '{"order": {"details": [{"OrderID": 101, "ProductName": "Laptop"}, {"OrderID": 102, "ProductName": "Mouse"}]}}',
4    '$.order') AS json_data;

Here:

  • JSON '{"order": {"details": [{"OrderID": 101, "ProductName": "Laptop"}, {"OrderID": 102, "ProductName": "Mouse"}]}}':
    This is the JSON string being passed, representing an order containing a list of products.
  • $.order: The JSON path specifies that the function should extract the order object from the JSON string.
  • JSON_QUERY: This function extracts the order object along with its nested details as a valid JSON string without modification.
Extracting the "order" object from a JSON string using the JSON_QUERY function in BigQuery. i-shadow

JSON_QUERY_ARRAY

JSON_QUERY_ARRAY is a standard function in BigQuery used to extract an array of JSON values from a JSON-formatted string. It is ideal for extracting arrays of values from complex or nested JSON structures, providing flexibility in working with JSON data that includes arrays.

Syntax:

The syntax for JSON_QUERY_ARRAY is as follows:

1JSON_QUERY_ARRAY(json_string_expr, json_path)

This function extracts an array of JSON objects from the provided json_string_expr based on the specified json_path.

Example:

This example extracts the list of products in the details array from an order, converting it into an array of JSON values:

1SELECT JSON_QUERY_ARRAY(
2  JSON '{"order": {"details": [{"OrderID": 101, "ProductName": "Laptop"}, {"OrderID": 102, "ProductName": "Mouse"}]}}',
3  '$.order.details'
4) AS json_array;

Here:

  •  JSON '{"order": {"details": [{"OrderID": 101, "ProductName": "Laptop"}, {"OrderID": 102, "ProductName": "Mouse"}]}}': This is the JSON string being passed, representing an order with a details array of products.
  • $.order: details specify the path to the details array within the order object.
  • JSON_QUERY_ARRAY: extracts the details array from the JSON string and returns it as an array of JSON objects.
Retrieving an array of product details from a JSON string using the JSON_QUERY_ARRAY function in BigQuery. i-shadow

💡 Want to take your BigQuery skills to the next level? Dive into our comprehensive guide on BigQuery Array Functions. Learn how to work with complex data structures efficiently and unlock the full potential of arrays in BigQuery!

JSON_VALUE

JSON_VALUE is a standard function in BigQuery used to extract scalar values from a JSON-formatted string. It is ideal for retrieving individual data points, such as strings, numbers, or booleans, from JSON structures.

Syntax:

The syntax for JSON_VALUE is as follows:

1JSON_VALUE(json_string_expr, json_path)

This function extracts a scalar value from the provided json_string_expr based on the specified json_path.

Example:

In this example, we extract the OrderID from a JSON string representing an order in the sales dataset:

1SELECT JSON_VALUE(
2  JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}',
3  '$.OrderID'
4) AS scalar_order_id;

Here:

  • JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}': This is the JSON string being passed, representing an order with attributes such as OrderID, CustomerName, ProductName, Quantity, and TotalAmount.
  • $.OrderID: The JSON path specifies that the function should extract the OrderID value from the JSON string.
  • JSON_VALUE: This function extracts the scalar value (101) from the JSON string and returns it as a single value.
Extracting the scalar OrderID value from a JSON string with the JSON_VALUE function in BigQuery. i-shadow

JSON_VALUE_ARRAY

JSON_VALUE_ARRAY is a standard function in BigQuery used to extract a JSON array of scalar values from a JSON-formatted string. It is ideal for retrieving arrays of string values from nested or complex JSON structures.

Syntax:

The syntax for JSON_VALUE_ARRAY is as follows:

1JSON_VALUE_ARRAY(json_string_expr, json_path)

This function extracts an array of scalar values from the provided json_string_expr based on the specified json_path.

Example:

This example extracts the list of products in the details array from an order, returning the values as a string array:

1SELECT 
2  ARRAY(
3    SELECT JSON_EXTRACT_SCALAR(details, '$.ProductName')
4    FROM UNNEST(JSON_EXTRACT_ARRAY(json_data, '$.OrderDetails')) details
5  ) AS ProductNames
6FROM (
7  SELECT '{"OrderID": "105", "OrderDetails": [{"ProductID": "P001", "ProductName": "Laptop"}, {"ProductID": "P005", "ProductName": "Printer"}]}' AS json_data )

Here: 

  1. FROM Clause with Subquery: The subquery at the bottom creates a temporary table-like structure with a single row containing a JSON string, which includes An OrderID and OrderDetails.
  2. JSON_EXTRACT_ARRAY Function: JSON_EXTRACT_ARRAY(json_data, '$.OrderDetails') is used to extract the OrderDetails array from the JSON string. 
  3. UNNEST Function:  UNNEST(...) details takes the array output from JSON_EXTRACT_ARRAY and expands it into a set of rows. 
  4. JSON_EXTRACT_SCALAR Function: Within the ARRAY(SELECT ...) construct, JSON_EXTRACT_SCALAR(details, '$.ProductName') is called for each item of the unnested array.

The outer ARRAY(...) function aggregates all the ProductName values extracted by JSON_EXTRACT_SCALAR into a single array.

The ProductNames column will contain an array of the product names from the order, specifically ["Laptop", "Printer"] for the given example. This makes the data structured and easily accessible for further analytical processes or reporting.

Retrieving an array from JSON string using the JSON_VALUE_ARRAY function in BigQuery. i-shadow

Lax Converters

Lax converters are functions in BigQuery that attempt to convert JSON data into their corresponding SQL data types. These functions are flexible and can handle mismatched types by returning NULL if the conversion fails, helping to avoid query errors during data processing.

LAX_BOOL

LAX_BOOL is a function in BigQuery used to convert JSON data to a boolean. This function is particularly useful when working with JSON boolean values, as it allows BigQuery to safely interpret and convert them into SQL boolean types without causing errors.

Syntax:

The syntax for LAX_BOOL is as follows:

1LAX_BOOL(json_expr)

This function attempts to convert the provided JSON expression into a SQL boolean value.

Example:

In this example, we extract the OrderStatus and convert it into a SQL boolean value:

1SELECT 
2  LAX_BOOL(JSON_EXTRACT(JSON '{"OrderStatus": true}', '$.OrderStatus')) AS result;

Converting the OrderStatus field from a JSON object to a boolean value using the LAX_BOOL function in BigQuery. i-shadow

Here:

  • JSON '{"OrderStatus": "Shipped"}': This represents a JSON object where the OrderStatus field is a string, not a boolean.
  • LAX_BOOL: This function expects a JSON boolean value (true or false). When applied to a string value like "Shipped", it returns NULL because it cannot coerce a string to a boolean.

This returns TRUE, since the value extracted is a valid JSON boolean.

LAX_FLOAT64

LAX_FLOAT64 is a function in BigQuery used to convert JSON data into a FLOAT64 SQL type. It is useful when working with JSON numbers, allowing for safe conversion into the appropriate numeric type, even if the data isn't strictly a valid float.

Syntax:

The syntax for LAX_FLOAT64 is as follows:

1LAX_FLOAT64(json_expr)

This function attempts to convert the provided JSON expression into a FLOAT64 value.

Example:

In this example, we convert the TotalAmount from an order into a FLOAT64 value:

1SELECT LAX_FLOAT64(
2  JSON_EXTRACT(
3    JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}',
4    '$.TotalAmount'
5  )
6) AS result;

 Converting the TotalAmount field from a JSON object to a FLOAT64 value using the LAX_FLOAT64 function in BigQuery. i-shadow

Here:

  • JSON_EXTRACT(..., '$.TotalAmount'): pulls out the numeric value (1400) as JSON.
  • LAX_FLOAT64(): safely converts that JSON number into a SQL FLOAT64.

This query returns 1400.0 as a valid FLOAT64 number.

LAX_INT64

LAX_INT64 is a function in BigQuery used to convert JSON data into an INT64 SQL type. This function is useful for safely converting JSON numeric values into integer types, even if the data isn't strictly an integer.

Syntax:

The syntax for LAX_INT64 is as follows:

1LAX_INT64(json_expr)

This function attempts to convert the provided JSON expression into an INT64 value.

Example:

In this example, we convert the Quantity from an order in the sales dataset into an INT64 value:

1SELECT LAX_INT64(
2  JSON_EXTRACT(
3    JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}',
4    '$.Quantity'
5  )
6) AS result;

Converting the Quantity field from a JSON object to an INT64 value using the LAX_INT64 function in BigQuery. i-shadow

Here:

  • JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}': represents a JSON object with various order details, including Quantity.
  • JSON function: LAX_INT64 converts the Quantity (2) into an INT64 value.

LAX_STRING

LAX_STRING is a function in BigQuery used to convert JSON data into a STRING SQL type. This function is useful for safely converting JSON string values into SQL string types, ensuring consistent data handling.

Syntax:

The syntax for LAX_STRING is as follows:

1LAX_STRING(json_expr)

This function attempts to convert the provided JSON expression into a STRING value.

Example:

In this example, we convert the Quantity from an order in the sales dataset into an INT64 value:

1SELECT LAX_STRING(
2  JSON_EXTRACT(
3    JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}',
4    '$.CustomerName'
5  )
6) AS result;

Converting the ProductName field from a JSON object to a SQL string using the LAX_STRING function in BigQuery. i-shadow

Here:

  • JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}': represents a JSON object with various order details, including Quantity.
  • LAX_INT64: converts the Quantity (2) into an INT64 value.

Converters

Converters in BigQuery are functions that explicitly convert JSON data into corresponding SQL data types, such as BOOL, FLOAT64, INT64, and STRING. These functions are more rigid compared to LAX converters. 

While LAX converters like LAX_BOOL and LAX_FLOAT64 attempt conversions and return NULL if the data type mismatch occurs, regular converters will throw an error if the conversion fails. This makes converters stricter, ensuring that the data matches the expected type exactly.

💡 To learn more about how BigQuery handles type conversions across various data formats, check out our in-depth article on BigQuery Conversion Functions. It covers essential functions like CAST, SAFE_CAST, and others that help ensure data is appropriately converted across your SQL queries.

BOOL

BOOL is used to convert JSON data into a SQL boolean type. It works when the JSON data is a valid boolean expression. BOOL ensures that the JSON value is properly interpreted as either TRUE or FALSE, making it compatible for use in logical operations or conditions.

Syntax:

The syntax for BOOL is as follows:

1BOOL(json_expr)

This function converts the provided JSON expression into a SQL boolean value.

Example:

In this example, we convert the OrderStatus field from a JSON order object into a SQL boolean value using the BOOL() function. The value of OrderStatus is a JSON boolean (true), making it valid for direct conversion.

1SELECT BOOL(
2  JSON_EXTRACT(
3    JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "OrderStatus": true}',
4    '$.OrderStatus'
5  )
6) AS order_status;

Converting the OrderStatus field in a JSON object to a boolean value using the BOOL function in BigQuery. i-shadow

Here:

  • JSON string input: A JSON object containing fields like OrderID, CustomerName, ProductName, and OrderStatus, where OrderStatus is set to the boolean value true.
  • JSON path: $.OrderStatus is used to extract the value of the OrderStatus field from the JSON object.
  • JSON function: JSON_EXTRACT(...) returns the raw JSON boolean value, and BOOL(...) converts that into a SQL BOOLEAN value (TRUE in this case).

FLOAT64

FLOAT64 is used to convert JSON data into a FLOAT64 SQL type. This function is helpful when working with JSON numbers, ensuring that they are correctly represented as floating-point numbers in BigQuery. You can also use the wide_number_mode argument to control how large numbers are handled.

Syntax:

The syntax for FLOAT64 is as follows:

1FLOAT64(json_expr [, wide_number_mode => { 'exact' | 'round' } ])

This function converts the provided JSON expression into a FLOAT64 value. Optionally, the wide_number_mode argument can control how large numbers are processed.

Example:

In this example, we convert the TotalAmount from an order in the sales dataset into a FLOAT64 value:

1SSELECT FLOAT64(
2  JSON_EXTRACT(
3    JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}',
4    '$.TotalAmount'
5  )
6) AS total_amount_float;

Converting the TotalAmount field from a JSON object to a FLOAT64 value using the FLOAT64 function in BigQuery. i-shadow

Here:

  • JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}':  represents a JSON object with the TotalAmount field as a number.
  • FLOAT64: converts the TotalAmount (1400) into a FLOAT64 value.

INT64

INT64 is used to convert JSON data into an INT64 SQL type. This function is helpful when working with JSON numeric data that needs to be converted into integer values for use in SQL operations, ensuring compatibility with BigQuery's integer type.

Syntax:

The syntax for INT64 is as follows:

1INT64(json_expr)

This function converts the provided JSON expression into an INT64 value.

Example:

In this example, we convert the OrderID from an order in the sales dataset into an INT64 value:

1SELECT INT64(
2  JSON_EXTRACT(
3    JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}',
4    '$.OrderID'
5  )
6) AS order_id;

Converting the OrderID field from a JSON object to an INT64 value using the INT64 function in BigQuery. i-shadow

Here:

  • JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}': represents a JSON object with the OrderID field as a number.
  • INT64: converts the OrderID (101) into an INT64 value.

STRING

STRING is used to convert JSON data into a STRING SQL type. It ensures that JSON string values are safely interpreted and returned as SQL strings, making them suitable for further operations in SQL queries.

Syntax:

The syntax for STRING is as follows:

1STRING(json_expr)

This function converts the provided JSON expression into a SQL string value.

Example:

In this example, we convert the ProductName from the sales dataset into a STRING value:

1SELECT STRING(
2  JSON_EXTRACT(
3    JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}',
4    '$.ProductName'
5  )
6) AS product_name;

Extracting and converting the ProductName field from a JSON object to a SQL string using the STRING function in BigQuery. i-shadow

Here:

  • JSON '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}': represents a JSON object with the ProductName field as a string.
  • STRING: converts the ProductName ("Laptop") into a SQL string value.

Other Converters

In BigQuery, "Other Converters" refers to a set of functions used to convert JSON data into different SQL data types. These converters ensure that JSON data, which may come in various formats, can be safely and correctly transformed into the appropriate SQL type.

PARSE_JSON

PARSE_JSON is a BigQuery SQL function used to convert a JSON-formatted STRING into a JSON value. It allows BigQuery to interpret text formatted as JSON and turn it into a JSON object, making it manipulable for queries. 

Syntax:

The syntax for PARSE_JSON is as follows:

1PARSE_JSON(json_string_expr [, wide_number_mode => { 'exact' | 'round' } ])

This function parses a JSON-formatted string and converts it into a JSON object that can be further queried in BigQuery.

Example:

In this example, we convert the OrderID and ProductName from the sales dataset into JSON Object:

1SELECT PARSE_JSON(
2  '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}'
3) AS json_data;

 Parsing a JSON-formatted string into a JSON object with the PARSE_JSON function in BigQuery. i-shadow

Here:

  • '{"OrderID": 101, "CustomerName": "John Doe", "ProductName": "Laptop", "Quantity": 2, "TotalAmount": 1400}': represents a JSON-formatted string from the sales dataset.
  • PARSE_JSON: converts the JSON string into a valid JSON object that can be queried in BigQuery.

TO_JSON

TO_JSON is a BigQuery SQL function used to convert SQL values or entire rows into JSON format. This function is helpful when you want to represent data as JSON objects, especially when working with JSON-based APIs or exporting data in JSON format.

Syntax:

The syntax for TO_JSON is as follows:

1TO_JSON(sql_value [, stringify_wide_numbers => { TRUE | FALSE } ])

This function converts the given SQL value (or entire row) into a JSON-formatted object.

Example:

In this example, we convert rows from the sales dataset into JSON format:

1WITH SalesTable AS (
2  SELECT 101 AS OrderID, "John Doe" AS CustomerName, "Laptop" AS ProductName, 2 AS Quantity, 1400 AS TotalAmount
3  UNION ALL
4  SELECT 102, "Jane Smith", "Mouse", 5, 100
5  UNION ALL
6  SELECT 103, "Alice Brown", "Keyboard", 3, 150
7)
8SELECT TO_JSON(t) AS json_objects
9FROM SalesTable AS t;

Converting structured SQL rows into JSON objects using the TO_JSON function in BigQuery. i-shadow

Here:

  • SalesTable: represents a set of rows from the sales dataset with columns such as OrderID, CustomerName, ProductName, Quantity, and TotalAmount.
  • TO_JSON: converts each row from the SalesTable into a JSON object.

TO_JSON_STRING

TO_JSON_STRING is a BigQuery SQL function used to convert SQL values (like STRUCT, ARRAY, or other data types) into JSON-formatted strings. This function is useful when you need to represent SQL values as JSON strings for exporting or working with external systems that use JSON.

Syntax:

The syntax for TO_JSON_STRING is as follows:

1TO_JSON_STRING(value[, pretty_print])

This function converts the provided value (or entire row) into a JSON-formatted string.

Example:

In this example, we convert a row from the sales dataset into a JSON-formatted string:

1SELECT TO_JSON_STRING(
2  STRUCT(101 AS OrderID, "John Doe" AS CustomerName, "Laptop" AS ProductName, 2 AS Quantity, 1400 AS TotalAmount)
3) AS json_data;

Transforming structured data into a JSON-formatted string using the TO_JSON_STRING function in BigQuery. i-shadow

Here:

  • STRUCT(101 AS OrderID, "John Doe" AS CustomerName, "Laptop" AS ProductName, 2 AS Quantity, 1400 AS TotalAmount): represents a row of data from the sales dataset.
  • TO_JSON_STRING:  converts the STRUCT into a JSON-formatted string.

Practical Examples of Working with JSON in BigQuery

Working with JSON data in BigQuery can unlock powerful analytical capabilities, allowing you to handle complex, semi-structured data. In this section, we'll explore practical examples that demonstrate how to manipulate, parse, and extract data from JSON structures.

Inserting JSON Values into a Table

Storing JSON values in a BigQuery table allows you to manage semi-structured data within a structured schema. This is especially useful when working with JSON-formatted inputs that need to be preserved and queried efficiently alongside standard SQL columns.

To insert JSON data into a table, you first need to ensure that the table includes a column with the JSON data type. Once confirmed, we can use the following statement to 

1INSERT INTO `owox-d-ikrasovytskyi-001.OWOX_Demo.Electronics_Dataset` 
2(OrderID, OrderDetails)
3VALUES
4  (104, JSON '{"CustomerName": "Bob White", "ProductName": "Monitor", "Quantity": 1, "TotalAmount": 250}'),
5  (105, JSON '{"CustomerName": "Alice Green", "ProductName": "Keyboard", "Quantity": 3, "TotalAmount": 150}');

Storing JSON-formatted order details into a BigQuery table using the INSERT INTO query. i-shadow

Here:

  • '{"CustomerName": "Bob White", "ProductName": "Monitor", "Quantity": 1, "TotalAmount": 250}': This represents the JSON-formatted data being inserted into the OrderDetails field for the first row, containing customer and product information.
  • JSON function: JSON converts the provided string into a JSON object that BigQuery can store in the OrderDetails field.

By using the JSON function in the INSERT INTO query, we can easily store and query JSON-formatted data in a BigQuery table.

Converting STRING Data to JSON Format

Converting a STRING type to a JSON type in BigQuery allows you to work with semi-structured data more effectively. This process is often necessary when dealing with data in string format that represents JSON objects.

Example:

In this example, we use the PARSE_JSON function to convert a STRING column (OrderID) from an existing table (Electronics) into a JSON object and store the results in a new table (JSON_Example):

1CREATE OR REPLACE TABLE `owox-d-ikrasovytskyi-001.OWOX_Demo.Electronics_JSON_Example` AS
2SELECT
3  104 AS OrderID,
4  PARSE_JSON('{"CustomerName": "Bob White", "ProductName": "Monitor", "Quantity": 1, "TotalAmount": 250}') AS OrderDetails_JSON;

Safely converting a STRING column into a JSON object using the SAFE.PARSE_JSON function in BigQuery. i-shadow

Post conversion result with schema details of newly created table in BigQuery. i-shadow

Here:

  • CREATE OR REPLACE TABLE: Creates a new table named Electronics_JSON_Example in your dataset. If the table already exists, it will be replaced.
  • SELECT 104 AS OrderID: Adds a column named OrderID with a static value 104. You can change or parameterize this as needed.
  • PARSE_JSON('...') AS OrderDetails_JSON: Parses the raw JSON string into a BigQuery JSON data type.
  • The parsed JSON is stored in a column named OrderDetails_JSON.

Transforming Schematized Data into JSON

Transforming structured (schematized) data into JSON format allows you to convert your tables into JSON objects for more flexible analysis.

Example:

In this example, we convert order details from the Electronics dataset into JSON format using the JSON_OBJECT function:

1WITH SalesDetails AS (
2  SELECT 101 AS OrderID, 'ProductName' AS key, 'Laptop' AS value UNION ALL
3  SELECT 101, 'Quantity', '2' UNION ALL
4  SELECT 101, 'TotalAmount', '1400'
5)
6
7SELECT 
8  OrderID,
9  JSON_OBJECT(ARRAY_AGG(key), ARRAY_AGG(value)) AS json_data
10FROM SalesDetails
11GROUP BY OrderID;

Converting schematized order details into JSON objects using the JSON_OBJECT function in BigQuery. i-shadow

Here:

  • WITH SalesDetails AS (...): Creates a temporary table with structured key-value pairs for a single order. All rows share the same OrderID, but different fields (ProductName, Quantity, etc.).
  • key / value columns: These columns represent the JSON object's keys and their corresponding values.
  • ARRAY_AGG(key), ARRAY_AGG(value): Aggregates all keys and values for the same OrderID into two arrays.
  • JSON_OBJECT(...): Combines the two arrays into a single JSON object.
  • GROUP BY OrderID: Ensures the JSON object is built per order, allowing this to scale across multiple orders if needed.

This query converts structured order details into JSON objects grouped by OrderID, allowing for more flexible storage and analysis of the data.

Converting SQL Types to JSON Format

Converting SQL types to JSON format allows you to transform structured data into JSON objects that can be easily processed and integrated with other systems. 

Example:

In this example, we convert the structured order details (OrderID, CustomerName, and TotalAmount) from the Electronics dataset into a JSON object:

1SELECT TO_JSON(STRUCT(OrderID, CustomerName, ProductName, TotalAmount)) AS order_json
2FROM `owox-d-ikrasovytskyi-001.OWOX_Demo.Electronics_Dataset`
3WHERE OrderID = 101;

Converting structured order details into a JSON object using the TO_JSON function. i-shadow

Here:

  • STRUCT(OrderID, CustomerName, ProductName, TotalAmount): This creates a structured value with the OrderID, CustomerName, ProductName, and TotalAmount columns from the sales dataset.
  • TO_JSON: This function converts the structured SQL value (created by STRUCT) into a JSON object.
  • WHERE OrderID = 101: Filters the data to only include the row with OrderID = 101, and the result is a JSON object for that specific order.

This query converts structured data from the sales dataset into a JSON object using the TO_JSON function, enabling better handling of semi-structured data.

Extracting Values from JSON Using Field Access and Subscript Operators

In BigQuery, you can access specific values in a JSON expression using field access operators for JSON objects and subscript operators for JSON arrays. This allows you to extract nested data in JSON format easily.

Example: Accessing a JSON object field

In this example, we extract the ProductName and TotalAmount fields from the OrderDetails JSON object.

1SELECT
2  OrderID,
3  OrderDetails_JSON.ProductName AS ProductName,
4  OrderDetails_JSON.TotalAmount AS TotalAmount
5FROM `owox-d-ikrasovytskyi-001.OWOX_Demo.Electronics_JSON_Example`
6WHERE OrderID = 101;

Accessing the ProductName field directly from a JSON object using the field access operator in BigQuery. i-shadow

Here:

  • OrderDetails.ProductName: The field access operator (.) is used to access the ProductName field inside the OrderDetails JSON object.
  • OrderDetails.TotalAmount: The same field access operator extracts the TotalAmount field within the OrderDetails object.
  • WHERE OrderID = 101: Filters the data to return results only for the order with OrderID = 101.

Example: Accessing an array element using Subscript Operator

In this example, we extract the first item in the items array from the OrderDetails JSON object.

1SELECT
2  OrderID,
3  OrderDetails_JSON.items[0] AS first_item
4FROM `owox-d-ikrasovytskyi-001.OWOX_Demo.Electronics_JSON_Example`
5WHERE OrderID = 105;

Extracting the first item from a JSON array using the subscript operator in BigQuery. i-shadow

Here:

  • OrderDetails_JSON.items[0]: The subscript operator ([0]) is used to access the first element in the items array inside the OrderDetails JSON object.
  • WHERE OrderID = 105: Filters the data to return results only for the order with OrderID = 105.

By using field access and subscript operators, we can easily extract both scalar values and array elements from a JSON object in BigQuery.

Extracting Scalar Values as Strings Using JSON_VALUE

The JSON_VALUE function in BigQuery extracts a scalar value from a JSON object and returns it as an SQL string. The specified path returns NULL if it doesn't point to a scalar value in the JSON. This function is particularly useful for extracting individual values from JSON objects as strings.

Example:

In this example, we use the JSON_VALUE function to extract the ProductName from the OrderDetails JSON field in the Electronics dataset.

1SELECT
2  JSON_VALUE(OrderDetails_JSON, '$.ProductName') AS product_name
3FROM `owox-d-ikrasovytskyi-001.OWOX_Demo.Electronics_JSON_Example`
4WHERE OrderID = 104;

Using the JSON_VALUE function in BigQuery to extract the ProductName scalar value. i-shadow

Here:

  • JSON_VALUE(OrderDetails_JSON, '$.ProductName': Extracts the ProductName field from the OrderDetails JSON object and returns it as an SQL string.
  • WHERE OrderID = 104: Filters the dataset to return results for the specific order with OrderID = 104.

The JSON_VALUE function is an efficient way to extract scalar values from JSON objects and convert them into SQL strings for further processing.

Working with JSON Arrays in BigQuery

JSON data can contain arrays, and extracting these arrays in BigQuery requires specific functions. BigQuery does not treat JSON arrays the same as ARRAY<JSON> types, but you can use functions like JSON_QUERY_ARRAY and JSON_VALUE_ARRAY to work with them. 

Example:

In this example, we use the JSON_QUERY_ARRAY function to extract the items array from the OrderDetails JSON field in the sales dataset:

1SELECT
2  JSON_VALUE(JSON_QUERY_ARRAY(OrderDetails_JSON, '$.items')[OFFSET(0)], '$.ProductName') AS ProductName
3FROM `owox-d-ikrasovytskyi-001.OWOX_Demo.Electronics_JSON_Example`
4WHERE OrderID = 105;

Extracting the "items" array from a JSON field in a BigQuery table using the JSON_QUERY_ARRAY function. i-shadow

Here:

  • JSON_VALUE(JSON_QUERY_ARRAY(OrderDetails_JSON, '$.items')[OFFSET(0)], '$.ProductName'): This function extracts the items array from the OrderDetails JSON field and returns it as an ARRAY<JSON>.
  • WHERE OrderID = 105: Filters the dataset to return results for the specific order with OrderID = 105.

By using JSON_QUERY_ARRAY, we can extract JSON arrays from structured data, allowing us to work with array-based information more effectively within BigQuery.

Advanced Use Cases of Parsing and Conversion of JSON Functions

Advanced use cases of parsing and converting JSON functions in BigQuery allow you to handle complex and deeply nested JSON structures with ease. These functions are invaluable for transforming and extracting data from large JSON datasets.

Working with Nested JSON Structures

Working with nested JSON structures in BigQuery allows you to handle complex data that contains multiple levels of nested objects and arrays. By using combinations of STRUCT and ARRAY, you can model and query hierarchical data. 

This technique is particularly useful when dealing with real-world JSON data, such as customer interactions, where multiple levels of nesting are common.

Example:

In this example, we simulate a nested JSON structure by combining STRUCT for customer details and an ARRAY for product attributes in the sales dataset:

1WITH Electronics_Dataset AS (
2  SELECT 
3    101 AS OrderID, 
4    'Laptop' AS ProductName, 
5    2 AS Quantity, 
6    1400 AS TotalAmount,
7    STRUCT(
8      'John Doe' AS CustomerName, 
9      'North' AS CustomerRegion
10    ) AS CustomerDetails,
11    [
12      STRUCT('Color' AS Attribute, 'Black' AS Value),
13      STRUCT('Size' AS Attribute, '15 inch' AS Value)
14    ] AS ProductAttributes
15)
16
17SELECT 
18  OrderID, 
19  ProductName, 
20  TotalAmount,
21  CustomerDetails.CustomerName,
22  CustomerDetails.CustomerRegion,
23  ProductAttributes
24FROM Electronics_Dataset;

Handling nested JSON structures in BigQuery by combining STRUCT and ARRAY. i-shadow

Here: 

  • STRUCT: The CustomerDetails is a nested STRUCT containing CustomerName and CustomerRegion.
  • ARRAY: The ProductAttributes is an ARRAY of STRUCT elements representing different product attributes such as color and size.
  • SELECT: Extracts the nested values from the STRUCT and ARRAY for each row.

Using STRUCT and ARRAY in BigQuery enables flexible querying and handling of complex, nested JSON data, allowing for more comprehensive data analysis.

Using LAX Functions for Flexible JSON Conversion

LAX functions in BigQuery provide a flexible way to convert JSON data into scalar SQL values. These functions allow you to attempt conversion even when the data type in the JSON does not exactly match the expected SQL type. 

Example:

In this example, we use the LAX_INT64 function to extract an INT64 value from the OrderID JSON field in the Electronics_Datset table:

1SELECT 
2  LAX_INT64(JSON_EXTRACT(OrderDetails_JSON, '$.OrderID')) AS order_id
3FROM `owox-d-ikrasovytskyi-001.OWOX_Demo.Electronics_JSON_Example`
4WHERE OrderID = 200;

Using the LAX_INT64 function in BigQuery to convert a string-encoded OrderID into an INT64 value from JSON data. i-shadow

Here:

  • JSON_EXTRACT(OrderDetails_JSON, '$.OrderID'): Extracts the OrderID value from the OrderDetails_JSON column as a JSON scalar (e.g., "200"). The $.OrderID path looks for the OrderID key at the top level of the JSON object.
  • LAX_INT64(...): Converts the extracted JSON scalar (like "200") into a proper SQL INT64 value. If the field is missing or cannot be cast, it safely returns NULL instead of erroring out.
  • AS order_id: Renames the output column in the result to order_id.
  • WHERE OrderID = 200: Filters to only return rows where the OrderID column is 200. (Important: this is the regular SQL column, not the value inside the JSON.)

LAX functions like LAX_INT64 are powerful tools for flexibly converting JSON data into the appropriate SQL types.

Using Conditional Logic with CASE Statements

Conditional logic in BigQuery allows you to handle different JSON structures or missing fields using CASE statements. This approach lets you create custom logic that responds to variations in JSON data, ensuring that your queries can process diverse data formats without encountering errors. 

Example:

In this example, we use a CASE statement to check if the OrderStatus field exists within the OrderDetails JSON object. If it does, we return the status; otherwise, we return a default value ('Unknown').

1SELECT 
2  OrderID,
3  CASE 
4    WHEN JSON_VALUE(OrderDetails_JSON, '$.OrderStatus') IS NOT NULL THEN JSON_VALUE(OrderDetails_JSON, '$.OrderStatus')
5    ELSE 'Unknown'
6  END AS order_status
7FROM `owox-d-ikrasovytskyi-001.OWOX_Demo.Electronics_JSON_Example`
8WHERE OrderID = 105;

Using a CASE statement in BigQuery with the JSON_VALUE function to check for and return the OrderStatus field from a JSON object. i-shadow

Here: 

  • JSON_VALUE(OrderDetails, '$.OrderStatus'): Extracts the OrderStatus field from the OrderDetails JSON object.
  • CASE: Evaluates whether the OrderStatus field is NULL. If it is not, it returns the value of OrderStatus; otherwise, it returns 'Unknown'.
  • WHERE OrderID = 105: Filters the dataset to include only the row with OrderID = 105.

Using CASE statements in combination with JSON functions provides a robust way to handle variations in JSON structures, including missing or inconsistent fields.

Troubleshooting Challenges in Parsing JSON with BigQuery

Parsing JSON in BigQuery can present various challenges, such as handling large files, dealing with nested structures, and addressing type mismatches. Effective troubleshooting is key to ensuring smooth data processing and accurate results. 

Managing Large JSON Files

⚠️ Challenge: Processing large JSON files in BigQuery can be slow and resource-heavy, especially when dealing with massive datasets.

Solution: Optimize performance by partitioning your data based on relevant attributes (e.g., dates) and using clustering to group similar data. This reduces the amount of data processed during queries, improving execution time and lowering costs.

Error Handling with Nested JSON Objects

⚠️ Challenge: Handling nested JSON objects can be complex, as objects within objects may require special parsing methods to extract the correct data.

Solution: Use BigQuery's operators and functions like JSON_EXTRACT_ARRAY and JSON_EXTRACT_SCALAR to handle nested structures efficiently. By familiarizing yourself with these tools, you can easily extract specific data from deeply nested JSON objects and simplify the parsing process.

Handling JSON Type Mismatches

⚠️ Challenge: JSON data in BigQuery may contain unexpected or mismatched data types, leading to errors during operations or query failures.

Solution: To avoid issues, use SAFE_CAST and the LAX_* functions, which attempt to convert JSON data into the correct SQL data types. These functions return NULL instead of causing errors, ensuring smoother query execution even when data types don’t match as expected.

Dealing with Large JSON Numbers and Precision Loss

⚠️ Challenge: Parsing large JSON numbers in BigQuery can lead to precision loss if the values exceed BigQuery's numeric limits.

Solution: Use the wide_number_mode argument in the PARSE_JSON function to control how large numbers are handled. You can choose to either round the values or trigger an error, allowing you to maintain better control over the precision of the numbers in your queries.

Best Practices for JSON Handling in BigQuery

Efficiently handling JSON data in BigQuery is crucial for optimizing performance and maintaining data accuracy. By following best practices, you can streamline queries, reduce processing costs, and ensure data consistency.

Improving Query Performance

When crafting SQL queries for JSON parsing, it’s important to consider performance factors. Avoid unnecessary functions and operators, use filters and aggregations efficiently, and leverage BigQuery’s query analysis tools, like the query optimizer and execution plan visualization, to pinpoint and address performance bottlenecks.

Maintaining Data Accuracy and Consistency

Maintaining data accuracy and consistency is essential for reliable analysis. To ensure your JSON data is correctly parsed, use BigQuery’s data validation tools, like schema inference and data integrity constraints. These features help verify that your JSON data is well-structured and error-free, ensuring accurate and consistent results.

Limit Nested Queries and JSON Arrays

BigQuery handles nested JSON objects and arrays efficiently, but excessive nesting can slow down queries. Limit the depth of nested structures and avoid using overly complex arrays unless necessary. This practice helps improve query performance and makes data processing more efficient.

💡 Looking to enhance your GA4 data analysis? Check out our detailed guide on Unnesting GA4 Event Parameters in BigQuery. Learn how to efficiently unnest and analyze event parameters in BigQuery for deeper insights into your data!

Use Partitioning and Clustering

Partitioning and clustering large datasets in BigQuery can enhance query performance. Partition tables are based on relevant columns like dates to limit the data processed. Clustering groups similar data together, enabling faster querying of JSON fields within those partitions and reducing overall processing time.

Use Efficient Data Types

Selecting the right data types is essential for optimizing performance in BigQuery. Choose compact and efficient types for JSON data, such as using INT64 instead of STRING for numeric values. This reduces storage requirements and speeds up query execution, ensuring more efficient data processing.

Using Schema Auto-Detection in BigQuery

Schema auto-detection in BigQuery allows you to automatically infer column names and data types when loading data from formats like CSV, JSON, or Google Sheets. This feature reduces the need for manual schema definition and is especially useful when dealing with evolving or semi-structured datasets. BigQuery scans up to 500 rows from a sample file to detect field types and structures. If a column contains only null values, it defaults to the STRING type.

To use auto-detection, you can enable it through the BigQuery web UI by checking the Auto detect option in the Schema section during table creation, or by using the – autodetect flag with the bq load command in the CLI. This feature is not required for Avro, Parquet, or ORC files, as those are self-describing formats.

Essential BigQuery Functions for Efficient Data Processing

BigQuery offers a variety of functions designed explicitly for parsing, querying, and manipulating BigQuery data. These powerful functions allow you to work with nested and semi-structured data efficiently.

  • BigQuery Array Functions - This article discusses functions for working with arrays, which are relevant when handling JSON arrays.
  • BigQuery Conversion Functions - This includes functions like CAST and SAFE_CAST, which are helpful when converting data types in JSON processing.
  • BigQuery Navigation Functions - This covers functions like UNNEST for extracting data from arrays, which can be crucial for working with JSON structures.
  • BigQuery Date Functions - While focused on date functions, this can be useful when dealing with dates in JSON data.
  • BigQuery DateTime Functions - Provides functions for handling date and time data, which is often encountered in JSON records.
  • BigQuery Aggregate Functions - This article covers aggregate functions that can be useful when summarizing JSON data, especially when working with nested structures.

Gain Deeper Insights with the OWOX Reports Extension for Google Sheets

The OWOX Reports is a powerful tool that allows users to seamlessly integrate BigQuery data with Google Sheets, enabling enhanced reporting and analysis. By automating data extraction, transformation, and visualization, the extension simplifies complex workflows and ensures that reports are always up-to-date with the latest data.

This integration significantly reduces manual effort, allowing teams to focus on insights rather than data preparation.

With features like automatic data refresh, scheduled reports, and advanced data analysis capabilities, OWOX BI empowers businesses to gain deeper insights from their BigQuery data. The extension helps optimize decision-making and boost productivity by providing easy-to-access, real-time business intelligence.

FAQ

What is JSON in BigQuery?
How do I extract values from a JSON object in BigQuery?
What is the difference between JSON_QUERY and JSON_VALUE in BigQuery?
How do I handle nested JSON data in BigQuery?
What are LAX functions and when should I use them in BigQuery?

You might also like

2,000 companies rely on us

Oops! Something went wrong while submitting the form...