2

the distribution of strings on simple strings works, but not on a series of stri...

 2 years ago
source link: https://www.codesd.com/item/the-distribution-of-strings-on-simple-strings-works-but-not-on-a-series-of-strings-in-pandas.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

the distribution of strings on simple strings works, but not on a series of strings in pandas

advertisements

I'm very new to python & pandas and have an issue. I have a series of 45398 strings which i need to edit. I imported them from an excel file.

import pandas as pd
import numpy as np
import xlrd

file_location = "#mypath/leistungen_2017.xlsx"
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)`

df = pd.read_excel("leistungen_2017.xlsx")

Here are the first few rows, just as example.

>>> df
Leistungserbringer  Anzahl  Leistung    Code    Rechnungsnummer
0   Albert  1   15.0160 Vollständige Spirometrie und Resistanc...   1   8957
1   Albert  1   15.0200 CO-Diffusion, jede Methode  1   8957
2   Albert  1   15.0285 Messung ausgeatmetes Stickstoffmonoxid...   1   8957
3   Albert  1   AMC-30864 Spirometriefilter mit Mundstück   1   8957
4   Albert  1   5889797 RELVAR ELLIPTA Inh Plv 92mcg/22mcg 30 Dos   1   8957
5   Albert  1   00.0010 Konsultation, erste 5 Min. (Grundkonsu...   1   8957

In the fourth column, there are a bunch of numbers in front of the text and I want to remove them for the whole series.

I tested around with single strings and it works fine with:

>>> str("15.0200 CO-Diffusion, jede Methode".split(' ', 1)[1:]).strip('[]')`
"'CO-Diffusion, jede Methode'"

I tried to apply this to the whole series with:

for entry in df.Leistung:
    df.Leistung.replace({entry : str(entry.split(' ', 1)[1:]).strip('[]')},  inplace=True)

The outcome for df.Leistung should look something like this:

0        Vollständige Spirometrie und Resistance (Plet...
1                             CO-Diffusion, jede Methode
2         Messung ausgeatmetes Stickstoffmonoxid ({eNO})
3                        Spirometriefilter mit Mundstück
4              RELVAR ELLIPTA Inh Plv 92mcg/22mcg 30 Dos
5         Konsultation, erste 5 Min. (Grundkonsultation)

Instead, I receive this:

0
1
2
3
4
5

One row gives this:

45384    'Dos\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'"\\\\\\\\\...

I would need to update the old series with the new series in the same column. I hope this was understandable and thank you in advance for posting any help.


You don't need loops in pandas, it's all vectorised. The replace function you are after falls under the .str. namespace. So you need to do ::

df.Leistung.str.replace(r'\d+', '')


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK