3

Pandas+SQLAlchemy与数据库交互

 3 years ago
source link: https://www.biaodianfu.com/pandas-sqlalchemy.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

在使用Pandas分析数据时,除了读取文本类数据,如Excel、CSV文件外,还会涉及到数据库的读取。通常的方案是Pandas+ SQLAlchemy。

以下内容是记录的一些操作记录,算是备忘。

使用sqlalchemy连接数据库

Engine 翻译过来就是引擎的意思,汽车通过引擎来驱动,而 SQLAlchemy 是通过 Engine 来驱动,Engine 维护了一个连接池(Pool)对象和方言(Dialect)。方言简单而言就是你连的到底是 MySQL 还是 Oracle 或者 PostgreSQL 还是其它数据库。

连接示例:

from sqlalchemy import create_engine
engine = create_engine('sqlite:///test.db')
# engine = create_engine("mysql+pymysql://root:root@localhost/hackathon")
from sqlalchemy import create_engine

engine = create_engine('sqlite:///test.db')
# engine = create_engine("mysql+pymysql://root:root@localhost/hackathon")

SQLAlchemy支持的数据库包括:PostgreSQL、MySQL、MariaDB、Oracle、Microsoft SQL Server、SQLite等。具体的连接方式这里不再详述。其中最容易出错的部分数据库密码存在特殊字符,需要先编码处理后再日提交。

解决方案:

import urllib.parse
urllib.parse.quote_plus("kx%jj5/g")
import urllib.parse
urllib.parse.quote_plus("kx%jj5/g")

将 DataFrame保存到数据库中

DataFrame自带to_sql方法,使用方法如下:

DataFrame.to_sql(self, name: str, con, schema=None, if_exists: str = 'fail', index: bool = True, index_label=None, chunksize=None, dtype=None, method=None)
DataFrame.to_sql(self, name: str, con, schema=None, if_exists: str = 'fail', index: bool = True, index_label=None, chunksize=None, dtype=None, method=None)

参数说明:

  • name:数据库表名称
  • con:数据库连接
  • schema:数据库模的schema,大部分连接的时候不需要
  • if_exists:可选:{‘fail’, ‘replace’, ‘append’}
  • index:是否将DataFrame的index列写入
  • index_label:index列的名称
  • chunksize:设置批量写入的条目数量
  • dtype:设置字段类型
  • method:插入方法,可不做设置

使用示例:

df.to_sql(con=engine, name='pair_result', if_exists='append')
df.to_sql(con=engine, name='pair_result', if_exists='append')

传统方式:通过游标方式取数据库数据

代码示例:

with engine.connect() as con:
rs = con.execute('SELECT * FROM testdb.abc')
for row in rs:
print(row)
with engine.connect() as con:
    rs = con.execute('SELECT * FROM testdb.abc')
    for row in rs:
        print(row)

将数据库的数据转化为DataFrame:

with engine.connect() as con:
rs = con.execute('SELECT * FROM testdb.abc ')
df = pd.DataFrame(rs.fetchall())
df.columns = rs.keys()
with engine.connect() as con:
    rs = con.execute('SELECT * FROM testdb.abc ')
    df = pd.DataFrame(rs.fetchall())
    df.columns = rs.keys()

简便方式:直接使用pd.read_sql()读取

from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('sqlite:///test.db')
# 查询表数据
df = pd.read_sql("sub_place", engine)
print(df.head())
# 使用SQL查询数据
query = '''SELECT *
FROM sub_place
WHERE father_area_name='中国'
china_df = pd.read_sql(query, engine)
print(china_df.head())
from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite:///test.db')

# 查询表数据
df = pd.read_sql("sub_place", engine)
print(df.head())

# 使用SQL查询数据
query = '''SELECT * 
            FROM sub_place 
            WHERE father_area_name='中国' 
            '''
china_df = pd.read_sql(query, engine)
print(china_df.head())

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK