24

OneDrive as Data Storage for Python Project

 2 years ago
source link: https://towardsdatascience.com/onedrive-as-data-storage-for-python-project-2ff8d2d3a0aa
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

OneDrive as Data Storage for Python Project

Using OneDrive API to sync OneDrive files directly to your Python projects

illustration for this article by Chaeyun Kim

We may already have this phrase “Data is the new oil”. And, we as data scientists work on the data science process — refining this new oil to be valuable and ready to use. The most fundamental step of the data science process is Data Storage. In this article, I am going to show an example of how to use Cloud technology as Data Storage.

OneDrive is one of the most efficient available cloud storage in terms of pricing and capacity. And, it is also very easy to get one. With the free version, you can use it You get 5 GB of free storage.But you can always subscribe to its service to get more storage. I have subscribed to the Microsoft 365 Family myself and get a total of 6 TB of cloud space along with the MS Office software to use. I think it is a great opportunity to use this space for data science projects with Python.

The Problem!?

Unfortunately, you cannot use the file directly via the URL share from OneDrive as it will return as an HTML page from OneDrive.com which requires you to click on a download button before the file can be used in your project.

In this short article, I will focus on how to sync files from OneDrive directly to Python in a few lines of code.

Create OneDrive Direct Download Link

Step 1: Share files through OneDrive and get a download link

This step is relatively simple. You can just upload or share files using OneDrive and click on the “share” then “Copy Link” buttons to create a cloud link.

1*V5BUsSF8pNNUOZ-UMfHh9Q.png?q=20
onedrive-as-data-storage-for-python-project-2ff8d2d3a0aa
Create OneDrive Link (by Author)

Step 2: Convert OneDrive URL to Direct Download URL

To be able to download your OneDrive files directly in Python, the shared URL from Step 1 has to be converted to a direct download URL which conforms to the OneDrive API guide here. Or, you can follow my script below using the base64 module.

onedrive_dd_generator.py (by Author)

With the function above, you can pass the shared OneDrive URL from step 1 into this function.

Use Case

Import Excel on OneDrive to Pandas’ Dataframe

Let’s try using the steps above with the sample time-series dataset below. It is hosted on my OneDrive.

We can use a script from the 2-steps above to generate a direct download link then import the excel data directly using Pandas. The full sample script is shown as followed:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK