Reading Parquet Metadata In Power Query In Power BI

Chris Webb M, Power Query February 11, 2024February 10, 2024

1 Minute

There’s a new M function in Power Query in Power BI that allows you to read the data from a Parquet file: Parquet.Metadata. It’s not documented yet and it’s currently marked as “intended for internal use only” but I’ve been told I can blog about it. Here’s an example of how to use it:

let
  Source = Parquet.Metadata(File.Contents("C:\myfile.snappy.parquet"))
in
  Source

…and here’s an example of the output:

This query shows how to expand the record returned by this function into a table:

let
    m = Parquet.Metadata(File.Contents("C:\myfile.snappy.parquet")),
    schema = List.Accumulate(Table.ToRecords(m[Schema]), [], (x, y) => if y[NumChildren] = null then Record.AddField(x, y[Name], y[LogicalType] ?? y[ConvertedType]) else x),
    expanded1 = Table.ExpandTableColumn(m[RowGroups], "Columns", {"MetaData"}),
    renamed1 = Table.RenameColumns(expanded1, {{"Ordinal", "RowGroup"}, {"TotalCompressedSize", "RowGroupCompressedSize"}, {"TotalByteSize", "RowGroupSize"}}),
    expanded2 = Table.ExpandRecordColumn(renamed1, "MetaData", {"Type", "Encodings", "PathInSchema", "Codec", "NumValues", "TotalUncompressedSize", "TotalCompressedSize", "KeyValueMetadata", "DataPageOffset", "IndexPageOffset", "DictionaryPageOffset", "Statistics", "EncodingStats"}),
    renamed2 = Table.RenameColumns(expanded2, {{"Type", "PhysicalType"}}),
    added1 = Table.AddColumn(renamed2, "Column", each Text.Combine([PathInSchema])),
    added2 = Table.AddColumn(added1, "Cardinality", each [Statistics][DistinctCount]),
    added3 = Table.AddColumn(added2, "NullCount", each [Statistics][NullCount]),
    added4 = Table.AddColumn(added3, "DictionarySize", each [DataPageOffset] - [DictionaryPageOffset]),
    added5 = Table.AddColumn(added4, "LogicalType", each Record.FieldOrDefault(schema, [Column], null)),
    selected = Table.SelectColumns(added5, {"RowGroup", "Column", "Codec", "NumValues", "Cardinality", "NullCount", "TotalCompressedSize", "TotalUncompressedSize", "DictionarySize", "PhysicalType", "LogicalType"})
in
    selected

As you can see this gives you all kinds of useful information about a Parquet file such as the schema, the compression type used, column cardinality and so on.

[Thanks to Curt Hagenlocher for the tip-off and the query above]

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published February 11, 2024February 10, 2024

Reading Parquet metadata in Power Query in Power BI

Reading Parquet Metadata In Power Query In Power BI

Share this:

Published by Chris Webb

Recommend

'We Pulled Off An SEO Heist': Entrepreneur Stole 3.6 Million Pageviews From Comp...

qq邮箱默认发信地址更改教程

SAP Best Practices for SAP Integrated Business Pla... - SAP Community

租对象回家过年已成历史，年轻人更热衷网络上寻找“赛博女友”

谷歌地图如何测量面积

小米13ultra配置详情

出海四小龙紧逼亚马逊，2024谁主沉浮？

超级碗LVIII使用了比之前所有游戏更多的AR、AI、相机和无线技术

New Report Highlights Key Tips to Maximize LinkedIn Performance [Infographic]

reconciliation was canceled recently in SAP B1, ho... - SAP Community

About Joyk