59

Data Science with Python: Intro to Data Visualization and Matplotlib

 6 years ago
source link: https://www.tuicool.com/articles/hit/mUZvyyM
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Data Science with Python: Intro to Data Visualization and Matplotlib

Intro

When a data scientist works with data, typically that data is stored in CSV files, excel files, databases, and other formats. Also, this data is commonly loaded as pandas DataFrame . For simplicity in the examples, I’ll be using Python lists that contains our data. I’m assuming that you have some knowledge about Python data types, functions, methods, and packages. If you don’t have that knowledge, I suggest you readmy previous article that covers these topics.

Data Visualizaion

Data visualization is a very important part of data analysis. You can use it to explore your data. If you understand your data well, you’ll have a better chance to find some insights. Finally, when you find any insights, you can use visualizations again to be able to share your findings with other people.

For example, look at the nice plot below. This plot shows the Life Expectancy and Income of 182 nations in the year 2015. Each bubble represents a country, the color represents a region, and the size represents the population of that country.

bmEza2z.png!web
Life Expectancy vs Income in the 2015 year. Source: https://www.gapminder.org/downloads/updated-gapminder-world-poster-2015/

If you’re interested in what data sources are used, you can find more information here . Also, there is an awesome interactive version of this chart available here , in which you can play historic time series, search for a certain country, change the data on the axis and so on. Here is a video that shows how to use this interactive chart.

However, the idea here is to learn the fundamentals of Data Visualization and Matplotlib. So, our plots will be much simpler than that example.

Basic Visualization Rules

Before we look at some kinds of plots, we’ll introduce some basic rules. Those rules help us make nice and informative plots instead of confusing ones.

  • The first step is to choose the appropriate plot type. If there are many possible variants, we can try to compare them to choose the best one for us.
  • Second, when we choose your type of plot, one of the most important things is to label your axis . If we don’t do this, the plot is not enough informative. When there are no axis labels, we can try to look at the code to see what data is used and if we’re lucky we’ll understand the plot. But what if we have just the plot as an image? What if we show this plot to your boss who doesn’t know how to make plots in Python?
  • Third, we can add a title to make our plot more informational .
  • Fourth, add labels for different categories when needed.
  • Five, optionally we can add a text or an arrow at interesting data points .
  • Six, in some cases we can use some sizes and colors of the data to make the plot more informative.

Types of Visualizations and Examples with Matplotlib

There are many types of visualizations. Some of the most famous are: line plot , scatter plot , histogram , box plot , bar chart , and pie chart . But among so many options how do we choose the right visualization? First, we need to make some exploratory data analysis. After we know the shape of the data, the data types, and other useful statistical information, it will be easier to pick the right visualization type. By the way, when I used the words “plot”, “chart”, and “visualization” I mean the same thing. Here , I found an image for chart suggestion that can be useful.

There are many visualization packages in Python. One of the most famous is Matplotlib. It can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, and web application servers.

Before we jump into the definitions and examples, I want to show you some basic functions of the matplotlib.pyplot subpackage, that we’ll see in the examples below. Here, I am assuming that the matplotlib.pyplot subpackage is imported with an alias plt .

  • plt.title(“My Title”) will add a title “My Title” to your plot
  • plt.xlabel(“Year”) will add a label “Year” to your x-axis
  • plt.ylabel(“Population”) will add a label “Population” to your y-axis
  • plt.xticks([1, 2, 3, 4, 5]) set the numbers on the x-axis to be 1, 2, 3, 4, 5. We can also pass and labels as a second argument. For, example, if we use this code plt.xticks([1, 2, 3, 4, 5], ["1M", "2M", "3M", "4M", "5M"]) , it will set the labels 1M, 2M, 3M, 4M, 5M on the x-axis.
  • plt.yticks() - works the same as plt.xticks() , but for the y-axis.

Line Plot: a type of plot which displays information as a series of data points called “markers” connected by straight lines . In this type of plot, we need the measurement points to be ordered (typically by their x-axis values). This type of plot is often used to visualize a trend in data over intervals of time - a time series .

To make a line plot with Matplotlib, we call plt.plot() . The first argument is used for the data on the horizontal axis, and the second is used for the data on the vertical axis. This function generates your plot, but it doesn’t display it. To display the plot, we need to call the plt.show() function. This is nice because we might want to add some additional customizations to our plot before we display it. For example, we might want to add labels to the axis and title for the plot.

Simple Line Plot
mUB7zaj.png!web

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK