17

Cinchoo ETL - Flattening Complex, Nested JSON Objects

 2 years ago
source link: https://www.codeproject.com/Tips/5316047/Cinchoo-ETL-Flattening-Complex-Nested-JSON-Objects
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

1. Introduction

ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.

This tip talks about flattening complex, nested JSON file using Cinchoo ETL framework. It is very simple to use, with few lines of code, the conversion can be done. You can convert large files as the conversion process is stream based, quite fast and with low memory footprint.

2. Requirement

This framework library is written in C# using .NET 4.5 / .NET Core 3.x Framework.

3. How to Use

3.1 Sample Data

Let's begin by looking into the below JSON file containing complex, nested structure. Say, for example, we have two companies and the first one has two branches and the second has just one branch.

Listing 3.1.1 Sample JSON File (company.json)
JavaScript
Copy Code
[
  {
    "ID": "1",
    "Name": "C1",
    "Branches": [
      {
        "City": "New York",
        "Country": "USA"
      },{
        "City": "California",
        "Country": "USA"
      }
    ]
  },
  {
    "ID": "2",
    "Name": "C2",
    "Branches": [
      {
        "City": "Mexico City",
        "Country": "Mexico"
      }
    ]
  }
]

We are going to flatten using Cinchoo ETL Framework to get the result as shown in the below table:

ID Name Branches/0/City Branches/0/Country Branches/1/City Branches/1/Country 1 C1 New York USA California USA 2 C2 Mexico City Mexico null null

Expected JSON output looks as below:

JavaScript
Copy Code
[
  {
    "ID": "1",
    "Name": "C1",
    "Branches/0/City": "New York",
    "Branches/0/Country": "USA",
    "Branches/1/City": "California",
    "Branches/1/Country": "USA"
  },
  {
    "ID": "2",
    "Name": "C2",
    "Branches/0/City": "Mexico City",
    "Branches/0/Country": "Mexico",
    "Branches/1/City": null,
    "Branches/1/Country": null
  }
]

3.2 Install Library

Next, install ChoETL.JSON / ChoETL.JSON.NETStandard nuget package. To do this, run the following command in the Package Manager Console.

.NET Standard Framework

Copy Code
Install-Package ChoETL.JSON

.NET Core

Copy Code
Install-Package ChoETL.JSON.NETStandard

Now add ChoETL namespace to the program.

JavaScript
Copy Code
using ChoETL;

3.3 Quick Conversion

Let's use the library to convert the complex structured JSON file to flatten format. It is as simple as can be done with few lines of code. No POCO class needed. It is fast, stream based, and consumes low memory.

Listing 3.3.1. Quick JSON file conversion
JavaScript
Copy Code
private static void QuickConversion()
{
    using (var r = new ChoJSONReader.LoadText("companies.json"))
    {
        using (var w = new ChoJSONWriter("companies_out.json")
               .Configure(c => c.ThrowAndStopOnMissingField = false)
               .Configure(c => c.FlattenNode = true)
               .Configure(c => c.DefaultArrayHandling = false)
               .Configure(c => c.NestedKeySeparator = '/')
               .WithMaxScanNodes(3)
               )
            w.Write(r);
    }
}

Create an instance of ChoJSONWriter for producing flatten JSON (companies_out.json) output file. Then create an instance of ChoJSONReader object for reading complex JSON (companies.json) file.

Where:

  • Configure(c => c.FlattenNode = true) - tells the writer to flatten the json
  • Configure(c => c.ThrowAndStopOnMissingField = false) - configure the writer to ignore missing field while writing. In the sample above, the second node, comes with 1 branch object, but expect to output the second branch with null values.
  • Configure(c => c.NestedKeySeparator = '/') - tells the writer to use '/' as nested key separator.
  • Configure(c => c.DefaultArrayHandling = false) - tells the writer to turn off the treating the empty nodes as simple object (not as array).
  • WithMaxScanNodes(3) - tells the writer to scan the number of nodes to discover the optimal maximum number of fields required to output the flat out JSON.

Sample Fiddle: https://dotnetfiddle.net/uzZdKU

3.4 Using POCO Conversion

This approach uses POCO model classes to load the JSON file. First, define the POCO classes matches the companies.json file.

Listing 3.4.1. POCO classes

Copy Code
public class Company
{
    public string ID { get; set; }
    public string Name { get; set; }
    public List<Branch> Branches { get; set; }
}

public class Branch
{
    public string City { get; set; }
    public string Country { get; set; }
}    
Listing 3.4.2. Conversion using POCO classes
JavaScript
Copy Code
private static void SelectiveColumnTest()
{
    using (var r = new ChoJSONReader<Company>("companies.json"))
    {
        using (var w = new ChoJSONWriter<Company>("companies_out.json")
            .Configure(c => c.FlattenNode = true)
            .Configure(c => c.ThrowAndStopOnMissingField = false)
            .Configure(c => c.NestedKeySeparator = '/')
            .WithMaxScanNodes(3)
            )
            w.Write(r);
    }
}

Create an instance of ChoJSONWriter for producing flatten JSON (companies_out.json) output file. Then create an instance of ChoJSONReader object for reading complex JSON (companies.json) file.

Where:

  • Configure(c => c.FlattenNode = true) - tells the writer to flatten the json
  • Configure(c => c.ThrowAndStopOnMissingField = false) - configures the writer to ignore missing field while writing. In the sample above, the second node, comes with 1 branch object, but expect to output the second branch with null values.
  • Configure(c => c.NestedKeySeparator = '/') - tells the writer to use '/' as nested key separator.
  • WithMaxScanNodes(3) - tells the writer to scan the number of nodes to discover the optimal maximum number of fields required output the flat out JSON.

Sample fiddle: https://dotnetfiddle.net/AX02BN

Download the sample attached above, try it.

For more information about Cinchoo ETL, please visit the other CodeProject articles:

History

  • 25th October, 2021: Initial version

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK