Getting started with Pandas - Letsprogram

Share:

Before you use pandas install it in your PC depending on your environment you are working. If you are using anaconda or miniconda use conda install pandas in the respective interpreter. If you use normal python IDE the download and install pandas using pip by the command pip install pandas. If you had any difficulties in installing the pandas check this link: installing pandas

After installing it in your PC import the pandas package into your program or interpreter.

>>>import pandas as pd

pd is just an alias for pandas instead of typing the whole name and it is assumed as a standard practice.

pandas is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool, built on top of the Python programming language.

we now know that pandas is a data manipulation tool but the question is
what kind of data do pandas handle?

When working with tabular data, such as data stored in spreadsheets or databases, Pandas is the right tool for you. Pandas will help you to explore, clean, and process your data. In Pandas, a data table is called a DataFrame.

Pandas DataFrame Representation

Pandas DataFrame - Letsprogram

I want to store the data of some movies I know the name of the movie, year released, and director.



When using the Python dictionary here keys will be used as columns and values will be used as rows of the data frame.

When selecting the single column from the pandas DataFrame the result will be Series. The selecting of a single column from the python dictionary is the same as the pandas DataFrame here.

You can create a single column or Series from first.

>>>ages=pd.Series([19,16,32,29],name="age")
>>>ages
0   19
1   16
2    32
3    29

Name: age,  dtype: int64

We stored the movie's information in df to get the lasted movie from all the collection of movies in the data we need to do something on the DataFrame.

>>>df["year"].max( )
2008

The latest movie was released in the year 2008. Not only max pandas will also provide you many functionalities.

If you need basic statistics of a numerical table in the DataFrame use describe( ) method. df.describe( )

Note

This is just a starting point. Similar to spreadsheet software, pandas represent data as a table with columns and rows. Apart from the representation, also the data manipulations and calculations you would do in spreadsheet software are supported by pandas. Continue reading the next tutorials to get started!


How to read and write data

I had a CSV file with data I had to read it to pandas DataFrame.  pandas gave read_csv( ) method to the operations on the file. pandas also support the other file formats like SQL, Excel, JSON, etc. each have a method with the prefix 
read_*( ) 


If you want to read first n columns use the head( ) method or to read last n columns use tail( ) method. The code written in the above program requires a titanic.csv file. Download the titanic.csv file.

You can get the datatypes details of all Series in the DataFrame by the attribute dtypes in the pandas.

You read the data from a CSV file to pandas DataFrames now you have to extract the data as your useful format. While read_*( ) methods read the data to_*( ) extract the data from the pandas. To know more about the reading and writing methods check this link: pandas_methods.

>>>titanic.to_excel('titianic.xlsv', sheet_name="passengers")

>>> titanic.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
If you want to know more than datatypes then we use info( ) method on the DataFrame. It provides you index value, Exact RAM used to hold the DataFrame as well. We can know the missing value numbers in the File. The info method is used to know the technical information about the data.


No comments

F