9

Data structures - Python Lists, Pandas Series and Numpy Arrays - JournalDev

 2 years ago
source link: https://www.journaldev.com/54383/overview-data-structures-in-python
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Data structures – Python Lists, Pandas Series and Numpy Arrays

Filed Under: Python

As a data scientist or analyst, you spend most of the time understanding, analyzing data. To get a good interpretation of your data or even for analyzing it, knowing data structures is paramount. Python has many data structures such as list, tuple, dictionary, set, and more.

Similarly, two main libraries of data analysis, Pandas and Numpy also support some data structures. Today, in this story, I will walk you through the Python list, Pandas series, and Numpy arrays. These are the building blocks, which will help you in many ways. 


More About Data Structures

  • A data structure is used to store the data in a system in an organized way so that working with it should be easy.
  • Note that data structure is not a programming language. They are a bunch of algorithms that can be used in any programming language to store or organize data.
  • The need for the data structure is, with this ever-growing world and technology, we are witnessing complex applications. So, data itself is growing every second. Here, we may face some issues such as speed, search and parallel working and retrieval which may slow down your system. So having your data in an organized way can take you over these issues.
  • There are 2 types of data structures are there. Primitive and Non-primitive. The primitive data structures operate directly as per the defined or machine instructions. But, non-primitive data structures are more complex and derived from the latter.
  • Some of the key operations on data structures are – Searching, sorting, insertion, deletion and updating.
  • The key advantages of them are – efficient, storage, reusability, time efficient and data manipulation.
data structure

Python Lists

There are 4 built-in data types in python. Those are Dictionaries, Tuples, Lists, and Sets. You can store different values of different data types in lists. It can be int, float, string… One more thing, a list can store another list in it. 

There are many methods that you can use while working with lists in python. Among them some of the important ones are, append, insert, delete, sort and copy.

It is not a good time to go deeper into lists. So, here I will be giving some examples which will make you get to know about lists and it’s operations.

Create a list

#list
demo_list = [1,4,2,5,8,6,9]
demo_list.remove(4)
[1, 2, 5, 8, 6, 9]
#append
demo_list = [1,4,2,5,8,6,9]
demo_list.append(4)
[1, 2, 5, 8, 6, 9, 10]

You can perform many list operations such as extend(), count(), sort() and more. Make sure you give it a try.


Numpy Arrays

Numpy is a robust library for computational operations in python. An array is a grid of values that includes values of the same data type. The rank of an array will be its dimension. You can perform many array actions such as slicing, indexing, and more.

Let’s see how a 1D and 2D look like and we can further perform some array actions on it.

#1D array
import numpy as np
demo_1D_array = np.array([11,22,33,44])
demo_1D_array
array([11, 22, 33, 44])
#2D array
demo_2D_array = np.array([[11,22,33,44],[55,66,77,88]])
demo_2D_array
array([[11, 22, 33, 44],
       [55, 66, 77, 88]])

Now, let’s sum up all the values present in the array.

#sum
demo_2D_array.sum()
396

Fine. Can we now generate random values using Numpy?

#random numbers
random_numbers = np.random.randint(0,5,50)
random_numbers
array([0, 3, 2, 2, 2, 3, 0, 1, 1, 1, 4, 4, 3, 0, 1, 4, 3, 2, 3, 1, 0, 0,
       3, 1, 0, 0, 3, 2, 2, 3, 2, 2, 0, 3, 4, 1, 1, 2, 4, 0, 3, 0, 4, 0,
       1, 0, 2, 4, 0, 0])

Perfect!


Pandas Series

Series is a core aspect of Pandas which can be defined using pd.series(). It is a labeled array that can contain multiple data types.

You can combine one or more series and it will become a data frame. Let’s create a simple data frame using the pandas series function.

#series
import pandas as pd
student = ['Jhon','Gracy','Spidy','Reko']
marks = [87,90,81,94]
#dataframe
df = pd.Series(marks, index = student)
Jhon     87
Gracy    90
Spidy    81
Reko     94
dtype: int64

Looks good.

You may be now wondering about the title of this article. Yes, I have defined the lists, arrays, and series to show you how they differ.


Storage

Yes. The key difference between them is storage. I will show you, if we can store some numbers on all these 3 data structures, they occupy significant spaces.

#storage
import sys
print(f"Lists:{sys.getsizeof(lists)} bytes")
print(f"Arrays:{sys.getsizeof(arrays)} bytes")
print(f"Series:{sys.getsizeof(series)} bytes")
Lists:136 bytes
Arrays:136 bytes
Series:184 bytes

We have to import sys to get the storage size of these data structures. Now, observe the storage access by these.


Wrapping Up

Data structures are the most important aspect that you should be familiar of you are working with data. In this article, I have shown three different data structures and the memory required for them. I hope it was a short but informative thing on the data structures.

That’s all for now. Happy Python!!!

More read: More articles on data structures and algorithms


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK