Github GitHub - awslabs/aws-data-wrangler: Pandas on AWS - Easy integration with...

3 years ago

source link: https://github.com/awslabs/aws-data-wrangler
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Quick Start

Installation command: pip install awswrangler

For platforms without PyArrow 3 support (e.g. EMR, Glue PySpark Job, MWAA): pip install pyarrow==2 awswrangler

import awswrangler as wr
import pandas as pd
from datetime import datetime

df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})

# Storing data on Data Lake
wr.s3.to_parquet(
    df=df,
    path="s3://bucket/dataset/",
    dataset=True,
    database="my_db",
    table="my_table"
)

# Retrieving the data directly from Amazon S3
df = wr.s3.read_parquet("s3://bucket/dataset/", dataset=True)

# Retrieving the data from Amazon Athena
df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db")

# Get a Redshift connection from Glue Catalog and retrieving data from Redshift Spectrum
con = wr.redshift.connect("my-glue-connection")
df = wr.redshift.read_sql_query("SELECT * FROM external_schema.my_table", con=con)
con.close()

# Amazon Timestream Write
df = pd.DataFrame({
    "time": [datetime.now(), datetime.now()],   
    "my_dimension": ["foo", "boo"],
    "measure": [1.0, 1.1],
})
rejected_records = wr.timestream.write(df,
    database="sampleDB",
    table="sampleTable",
    time_col="time",
    measure_col="measure",
    dimensions_cols=["my_dimension"],
)

# Amazon Timestream Query
wr.timestream.query("""
SELECT time, measure_value::double, my_dimension
FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
""")

Recommend

Github GitHub - awslabs/aws-data-wrangler: Pandas on AWS - Easy integration with...

Quick Start

Recommend

[ANN] ocaml-wayland (pure OCaml wayland protocol library)

分布式技术专题-中间件容器的实现原理（1）Tomcat的原理之架构设计模式

Abstract Heresies: Can continuation passing style code perform well?

JavaScript Scheduler: Copy Multiple Events | DayPilot Code

Bridging the gap between the needs of employees and leaders in the future hybrid...

European Central Bank releases results of digital euro consultation

If you don't know, now you know

HarmonyOS三方件开发指南(17)-BottomNavigationBar

Oracle introduces personalized Journeys to help employees navigate HR processes

Github GitHub - Sterlingg/json-snatcher: Get the path to a JSON element in Emacs...

About Joyk