Before you use pandas install it in your PC depending on your environment you are working. If you are using anaconda or miniconda use conda install pandas in the respective interpreter. If you use normal python IDE the download and install pandas using pip by the command pip install pandas. If you had any difficulties in installing the pandas check this link: installing pandas.
After installing it in your PC import the pandas package into your program or interpreter.
>>>import pandas as pd
pd is just an alias for pandas instead of typing the whole name and it is assumed as a standard practice.
pandas is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool, built on top of the Python programming language.
we now know that pandas is a data manipulation tool but the question is
what kind of data do pandas handle?
When working with tabular data, such as data stored in spreadsheets or databases, Pandas is the right tool for you. Pandas will help you to explore, clean, and process your data. In Pandas, a data table is called a DataFrame.
Pandas DataFrame Representation
I want to store the data of some movies I know the name of the movie, year released, and director.
When selecting the single column from the pandas DataFrame the result will be Series. The selecting of a single column from the python dictionary is the same as the pandas DataFrame here.
You can create a single column or Series from first.
>>>ages=pd.Series([19,16,32,29],name="age")
>>>ages
0 19
1 16
2 32
3 29
Name: age, dtype: int64
We stored the movie's information in df to get the lasted movie from all the collection of movies in the data we need to do something on the DataFrame.
>>>df["year"].max( )
2008
The latest movie was released in the year 2008. Not only max pandas will also provide you many functionalities.
If you need basic statistics of a numerical table in the DataFrame use describe( ) method. df.describe( )
How to read and write data
I had a CSV file with data I had to read it to pandas DataFrame. pandas gave read_csv( ) method to the operations on the file. pandas also support the other file formats like SQL, Excel, JSON, etc. each have a method with the prefix
read_*( )
You can get the datatypes details of all Series in the DataFrame by the attribute dtypes in the pandas.
You read the data from a CSV file to pandas DataFrames now you have to extract the data as your useful format. While read_*( ) methods read the data to_*( ) extract the data from the pandas. To know more about the reading and writing methods check this link: pandas_methods.
>>>titanic.to_excel('titianic.xlsv', sheet_name="passengers")
>>> titanic.info()
If you want to know more than datatypes then we use info( ) method on the DataFrame. It provides you index value, Exact RAM used to hold the DataFrame as well. We can know the missing value numbers in the File. The info method is used to know the technical information about the data.
No comments