9

How to use Data Lake Files

 2 years ago
source link: https://blogs.sap.com/2021/11/23/how-to-use-data-lake-files/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
November 23, 2021 2 minute read

How to use Data Lake Files

What is Data Lake Files?

Data lake Files is a component of SAP HANA Cloud that provides secure, efficient storage for large amounts of structured, semi-structured, and unstructured data. Data lake Files is automatically enabled when you provision a data lake instance.

Provisioning creates the data lake Files container as a storage location for files. This file store lets you use data lake as a repository for big data. For more information on provisioning data lake Files with your data lake instance, see Creating SAP HANA Cloud Instances.

Configuration the File Container

I will introduce the step by Rest Api.

  1. Create HANA DB on BTP with Data Lake
    1637566964930.jpg
  2. Note that the storage service type selects SAP Native1637567108249.jpg
  3. Go to the SAP HANA Cloud on BTP, click Data Lake instance Actions -> Open SAP HANA Cloud Central
    1637567350200.jpg
  4. Next, please configure the file container like this URL -> Setting Up Initial Access to HANA Cloud data lake Files

Okay, the data lake file configuration is complete.

Using the File Container

We can start to fetch or upload files through the Rest API.

  • Copy the instance ID and execute the following cmd command in the authorized folder locally.
    1637568609611.jpg

Get list status:

curl --insecure -H "x-sap-filecontainer: {{instance-id}}" --cert ./client.crt --key ./client.key "https://{{instance-id}}.files.hdl.canary-eu10.hanacloud.ondemand.com/webhdfs/v1/user/home/?op=LISTSTATUS" -X GET

You will see:

1637568856476.jpg

  • Upload file please execute the command
    curl --location-trusted --insecure -H "Content-Type:application/octet-stream" -H "x-sap-filecontainer: {{instance-id}}" --cert ./client.crt --key ./client.key --data-binary "@Studies.csv" "https://{{instance-id}}.files.hdl.canary-eu10.hanacloud.ondemand.com/webhdfs/v1/user/home/Studies.csv?op=CREATE&data=true&overwrite=true" -X PUT​
  • Now get the list status again, you can see the file just uploaded

1637569925882-1.jpg

Read the contents of the file into the DB table

Go to the SAP HANA database explorer and open the account.

Note that in this step, you must ensure that the table fields in the database are the same as those in the csv file.

The IQ table I use here to load the data, please refer to

CALL SYSHDL_BUSINESS_CONTAINER.REMOTE_EXECUTE('
LOAD TABLE MANAGEMENT_STUDIES
(status_code,study_num,description,study_ID,protocol_ID,lastSubjectLastVisit,isLeanStudy,studyPhase,ID) 
FROM ''hdlfs:///user/home/archiving/Studies.csv'' 
format csv 
SKIP 1 
DELIMITED BY '','' 
ESCAPES OFF' );

You can use data lake file to save some unstructured data, or to storage some archiving files, which seems to be a new good choice besides object store and AWS, etc.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK