5

Cinchoo ETL - CSV Lite Reader

 2 years ago
source link: https://www.codeproject.com/Articles/5320967/Cinchoo-ETL-CSV-Lite-Reader
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

1. Introduction

ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time. This article talks about using CSVReader component offered by ChoETL framework. It is a simple utility class to extract CSV data from file / source.

Features:

  • Ultra-fast CSV Reader, parses CSV file quickly. (1Millon rows, 20 columns taking about ~7secs) 
  • Stream based parsers allow for ultimate performance, low resource usage, and nearly unlimited versatility scalable to any size data file, even tens or hundreds of gigabytes.
  • Follows CSV standard file rules (Multi-line, quoted columns etc.). 
  • In addition to comma, most delimiting characters can be used, including tab delimited fields.
  • Exposes IEnumarable list of objects - which is often used with LINQ query for projection, aggregation and filtration etc.
  • Supports deferred reading.
  • Ability to get typed list of objects from CSV file.

2. Requirement

This framework library is written in C# using .NET 4.5 Framework / .NET core 2.x.

3. "Hello World!" Sample

  • Open VS.NET 2013 or higher
  • Create a sample VS.NET (.NET Framework 4.5) Console Application project
  • Install ChoETL via Package Manager Console using Nuget Command based on the .NET environment:
    • Install-Package ChoETL
    • Install-Package ChoETL.NETStandard
  • Use the ChoETL namespace

Let's begin by looking into a simple example of reading CSV file having 2 columns

Listing 3.1 Sample CSV data file (Emp.csv)

Copy Code
Id,Name
1,Tom
2,Carl
3,Mark

There are number of ways you can get the CSV file parsing started with minimal setup

3.1. Quick load - Data First Approach

It is the zero config, quick way to load a CSV file in no time. No POCO object is required. Sample code below shows how to load the file

Listing 3.1.1 Load CSV file using iterator (fastest)

Copy Code
using (var r = new ChoCSVLiteReader())
{
    //Open the reader, skip the header
    foreach (var cols in r.ReadFile("emp.csv").Skip(1))
    {
        Console.WriteLine($"Id: {cols[0]}");
        Console.WriteLine($"Name: {cols[1]}");
    }
}

Sample fiddle: https://dotnetfiddle.net/kWhr27

Listing 3.1.2 Load CSV file using loop (fastest)

Copy Code
using (var r = new ChoCSVLiteReader())
{
    var recNum =  r.ReadFile("emp.csv").Skip(1).GetEnumerator();
    //Open the reader, skip the header
    while (recNum.MoveNext())
    {
        var cols = recNum.Current;
        Console.WriteLine($"Id: {cols[0]}");
        Console.WriteLine($"Name: {cols[1]}");
    }
}

Sample fiddle: https://dotnetfiddle.net/bV7nq5

You can also access csv fields by names as well. Sample below shows how to access them by names

Using ChoDynamicObject (special type of expando object)

Listing 3.1.3 Load CSV file using column names (using ChoDynamicObject)

Copy Code
using (var r = new ChoCSVLiteReader())
{
    foreach (dynamic rec in r.ReadFile<ChoDynamicObject>("emp.csv", true))
    {
        Console.WriteLine($"Id: {rec.Id}");
        Console.WriteLine($"Name: {rec.Name}");
    }
}

Sample fiddle: https://dotnetfiddle.net/PTnx2L

Using ExpandoObject 

Listing 3.1.4 Load CSV file using column names (using ExpandoObject)

Copy Code
using (var r = new ChoCSVLiteReader())
{
    foreach (var rec in r.ReadFile<ExpandoObject>("emp.csv", true))
    {
        Console.WriteLine($"Id: {rec.Id}");
        Console.WriteLine($"Name: {rec.Name}");
    }
}

If the CSV file does not comes with header, CSVReader auto name the columns as Column1, Column2 ... in the dynamic object.

3.2. Code First Approach

This is another zero config way to parse and load CSV file using POCO class. First define a simple data class to match the underlying CSV file layout

Listing 3.2.1 Simple POCO entity class

Copy Code
public partial class EmployeeRec
{
    public int Id { get; set; }
    public string Name { get; set; } 
}

In above, the class defines two properties matching the sample CSV file template.

3.2.1 Using User Defined Mapper

Sample below shows how to load CSV using custom user defined mapper 

Listing 3.2.1.1 Load CSV file with custom user defined mapper

Copy Code
foreach (var rec in r.ReadFile<EmployeeRec>("emp.csv", true, 
     mapper: (lineno, cols, rec) =>
     {
         rec.Id = cols[0].CastTo<int>();
         rec.Name = cols[1];
     }))
{
    Console.WriteLine($"Id: {rec.Id}");
    Console.WriteLine($"Name: {rec.Name}");
}

In above sample, we take control of loading CSV values to object members using mapper function.

Sample fiddle: https://dotnetfiddle.net/NZZ5EK

3.2.2 Using default built-in Mapper

Sample below shows how to load the CSV file using default built-in mapped comes with CSV reader

Listing 3.2.2.1 Load CSV file with built-in mapper (default map)

Copy Code
foreach (var rec in r.ReadFile<EmployeeRec>("emp.csv", true))
{
    Console.WriteLine($"Id: {rec.Id}");
    Console.WriteLine($"Name: {rec.Name}");
}

In above sample, we let the parse use the built-in mapper feature to load the CSV values to object members. By default, the built-in mapper simply maps the CSV columns to object members by index (first column map to first object member, second one maps to second object member and so on).

Sample fiddle: https://dotnetfiddle.net/IZRKWT

3.2.3 Using positional built-in Mapper

If the CSV files comes with different order from defined POCO model object, but wanted to load them by positional mapping, you can do so by decorating the object members with ColumnAttribute to specify the mapping order to CSV columns

Listing 3.2.3.1 POCO entity class with OrderAttribute

Copy Code
public partial class EmployeeRec
{
    [Column(Order=1)]
    public string Name { get; set; } 
    [Column(Order=0)] 
    public int Id { get; set; }
}

Listing 3.2.3.2 Load CSV file with built-in mapper (positional map)

Copy Code
foreach (var rec in r.ReadFile<EmployeeRec>("emp.csv", true))
{
    Console.WriteLine($"Id: {rec.Id}");
    Console.WriteLine($"Name: {rec.Name}");
}

In above sample, parser used order attribute to map the CSV columns to corresponding object members during parsing.

Sample fiddle: https://dotnetfiddle.net/fwd3j5

3.2.4 Using naming built-in Mapper

If the CSV files comes with column headers not matching with defined POCO model object members, you can match them by using DisplayNameAttribute / DisplayAttribute to specify the CSV column names to object members

Listing 3.2.4 POCO entity class with DisplayNameAttribute

Copy Code
public partial class EmployeeRec
{
    [DisplayName("Id")]
    public int Identifier { get; set; }
    [DisplayName("Name")]
    public string GivenName { get; set; } 
}

Listing 3.2.5 Load CSV file with built-in mapper (name map)

Copy Code
foreach (var rec in r.ReadFile<EmployeeRec>("emp.csv", true))
{
    Console.WriteLine($"Id: {rec.Id}");
    Console.WriteLine($"Name: {rec.Name}");
}

In above sample, parser uses the display attributes to map the CSV columns to corresponding object members during the parsing.

Sample fiddle: https://dotnetfiddle.net/K65Ywq

3.2. Other Reader Methods

Non-generices overloads

  1. ReadText - Parses csv text, returns string[].
  2. ReadFile - Parses csv file, returns string[].
  3. Read - Parses csv stream, returns string[].
  4. ReadLines - Parses csv lines, returns string[].

Generic overloads

  1. ReadText<T> - Parses csv text, returns T.
  2. ReadFile<T>- Parses csv file, returns T
  3. Read<T>- Parses csv stream, returns T
  4. ReadLines<T>- Parses csv lines, returns T

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK