## Informatics Practices Class 12 Notes – Python Pandas

→ Introduction to Python Libraries: Python libraries contains a collection of built-in modules that allow us to perform many actions without writing detailed programs for it.

→ NumPy, Pandas and Matplotlib are three well-established Python libraries for scientific and analytical use. These libraries allow us to manipulate, transform and visualise data easily and efficiently.

→ NumPy: NumPy stands for ‘Numerical Python’, is a library package that can be used for numerical data analysis and scientific computing.

→ Pandas: Pandas stands for ‘PANeL DAta’. It is a high-level data manipulation tool used for analysing data.

→ Matplotlib: The Matplotlib library in Python is used for plotting graphs and visualisation. Using Matplotlib, with just a few lines of code we can generate publication quality plots, histograms, bar charts, scatter plots, etc.

→ Difference between Pandas and NumPy: Following are some of the differences between Pandas and NumPy:

→ A Numpy array requires homogeneous data, while a Pandas DataFrame can have different data types (float, int, string, etc).

→ Pandas DataFrames (with column names) make it very easy to keep track of data. ”

→ Pandas is used when data is in tabular format, whereas NumPy is used for numeric array based data manipulation.

→ Installing Pandas: To install Pandas from command line, we need to type in: pip install pandas

→ Importing Pandas: In order to work with Pandas in Python, we need to import Pandas library in Python environment. We can do this either on the shell prompt or in our script file (.py) by writing: import pandas as pd

→ Pandas Data Structure: A data structure is a particular way of storing and organising data in a computer to suit a specific purpose so that it can be accessed and worked with in appropriate ways. Two commonly used data structures in Pandas are:

→ Series: It is 1-dimensional data structure of Python Pandas.

→ DataFrame: It is 2-dimensional data structure of Python Pandas.

→ Series Data Structure: A series is a one-dimensional array containing a sequence of values of any data type (int, float, list, string, etc.) which by default have numeric data labels starting from zero. The data label associated with a particular value is called its index.

Example:

→ Creation of Series: A series can be created in many ways using Pandas library’s series (). Make sure that we have imported pandas and NumPy modules with import statements.

→ Create Empty Series Object by using just Series () with no Parameter: To create an empty objects i.e., having no values, we can just use the series () as:

= Panda. Series ( )

→ Creating Non-empty Series Object: To create non-empty series, the we need to specify arguments for data and indexes as per the following syntax:

< Series Object> = pd. Series (data, index = idx) where idx is a valid NumPy datatype and data is the data part of the series object, it can be one of the following:

→ Creation of Series from Scalar Value: A series can be created using scalar values as:

>>>import pandas as pd

>>>series 1 = pd.Series ([10,20,30]) >>>print

Output:

Index | Data values |

0 | 10 |

1 | 20 |

2 | 30 |

dtype : int 24

→ Creation of Series from NumPy Arrays: We can create a series from one-dimensional NumPy array as:

>>> import numpy as np

>>> import pandas as pd

>>> array 1 = np.array ([11, 22, 33, 44]) (arrayl)

>>> series 1= pd.Series

>>> print (series 1)

Output:

Index | Data values |

0 | 11 |

1 | 22 |

2 | 33 |

3 | 44 |

dtype : int 32

→ Creation of Series from Dictionary: We can create a series by specifying indexes and values through a dictionary as:

>>>dict 1 = { ‘Uttar Pradesh’ : ‘Lucknow’, ‘Rajasthan’ : ‘Jaipur’}

>>> print (diet 1) {‘Uttar Pradesh’: ‘Lucknow’, ‘Rajasthan’: ‘Jaipur’}

>>> Series 1 = pd.Series (diet 1)

>>> print (series 1)

Output:

Uttar Pradesh Lucknow

Rajasthan Jaipur

dtype: object

→ Accessing Elements of a Series: There are two common ways for accessing the elements of a series: Indexing and Slicing.

→ Indexing: Indexing in series is similar to that for NumPy arrays, and is used to access elements in a series. Indexes are of two types: positional index and labelled index. Positional index takes . an integer value that corresponds to its position in the series starting from 0, whereas labelled index takes any user-defined label as index.

Example:

>>> seriesNum = pd. Series ([ 11,22,33 ] )

>>> seriesNum [1]

Here, the value 30 is displayed for the positional index 2.

→ Slicing: This is similar to slicing used with NumPy arrays. We can define which part of the series is to be sliced by specifying the start and end parameters [start: end] with the series name. When we use positional indices for slicing, the value at the endindex position is excluded, i.e., only (end – start) number of data values of the series are extracted.

Example:

>>> seriesCapState = pd.Series([‘Dispur’, ‘Patna’, ‘Panaji’], index=[‘Assam’, ‘Bihar’, ‘Goa’])

>>> seriesCapState[1:2] Bihar Patna dtype: object

Here, only data values at indices 1 is displayed i.e. excludes the value at index position 2.

→ Attributes of Series: We can access certain properties called attributes of a series by using that property with the series name.

Attribute Name | Purpose |

name | assigns a name to the series |

index.name | assigns a name to the index of the series |

values | prints a list of the values in the series |

size | prints the number of values in the series object |

empty | prints True if the series is empty, and False otherwise |

→ Methods of Series:

Head(n) | Returns the first n members of the series. If the value for n is not passed, then by default n takes 5 and the first five members are displayed. |

Count ( ) | Returns the number of non-NaN values in the series. |

Tail (n) | Returns the last n members of the series. If the value for n is not passed, then by default n takes 5 and the last five members are displayed. |

→ Mathematical Operations on Series:

→ Addition: We can use the’+’ Operator or add() method of series to perform addition between two series objects.

→ Subtraction: We can use the Operator or sub() method of series to perform subtraction between two series objects.

→ Division: We can use the’/’ Operator or div() method of series to perform division between two series objects.

→ Multiplication: We can use the ‘*’ Operator or mul() method of series to perform multiplication between two series objects.

→ Exponential Power: We can use the ‘**’ Operator or pow() method of series to put each element of passed series as exponential power of caller series and return the results.

→ DataFrame Data Structure: A DataFrame is a two-dimensional labelled data structure like a table of MySQL. It contains rows and columns, and therefore has both a row and column index. The row index is known as index and the column index is called the column-name.

→ Creation of DataFrame: There are a number of ways to create a DataFrame. Some of them are listed in this section.

→ Creation of an empty DataFrame: An empty DataFrame can be created as follows:

>>> import pandas as pd

>>> dFrameEmt = pd. DataFrame ()

>>> dFrameEmt

Output:

Empty DataFrame

Columns: [ ] ‘

Index: [ ]

→ Creation of DataFrame from NumPy ndarrays: Consider the following three NumPy ndarrays. Let us create a simple DataFrame without any column labels, using a single ndarray:

>>> import numpy as np

>>> arrayl = np.array([11,22,33])

>>> array2 = np. array ( [ 110,210,310] )

>>> array3 = np.array([-100,-20 0,-300, -400])

>>> dFrame4 = pd.DataFrame(arrayl)

>>> dFrame4

Output:

0 11

1 22

2 33

→ Creation of DataFrame from List of Dictionaries: We can create DataFrame from a list of Dictionaries as:

>>> listDict = [ { ‘a’ : 11, ‘b’:22}, { ‘a’:5,’b’:10, ‘c’:20 } ] .

>>> dFrameListDict = pd.DataFrame(listDict)

>>> dFrameListDict

Output:

a | b | c | |

0 | 11 | 22 | NaN |

1 | 5 | 10 | 20.0 |

→ Creation of DataFrame from Dictionary of Lists: DataFrames can also be created from a dictionary of lists.

>>> dictForest = { ‘State’ : [ ‘Kanpur’,’Delhi’, ‘Udaipur’], ‘GArea’: [96838, 7583,44552],’VDF’ : [3197, 4.42, 2563]}

>>> dFrameForest= pd.DataFrame(dictForest)

>>> dFrameForest

Output:

State | GArea | VDF | |

0 | Kanpur | 96838 | 3197.00 |

1 | Delhi | 7583 | 4.42 |

2 | Udaipur | 44552• | 2563.00 |

→ Creation of DataFrame from Dictionary of Series: A dictionary of series can also be used to create a DataFrame as:

>>> ResultSheet= {

‘Rohit’: pd.Series([80, 92, 87], index=[‘English’,’Science’,’Maths’]),

‘Ayush’: pd.Series([72, 81, 94], index=[‘English’,’Science’,’Maths’]),

‘Priya’: pd.Series([84, 86, 78], index=[‘English’,’Science’,’Maths’]),

>>> ResultDF = pd.DataFrame(ResultSheet)

>>> ResultDF

Output:

Rohit | Ayush | Priya | |

English | 80 | 92 | 87 |

Science | 72 | 81 | 94 |

Maths | 84 | 86 | 78 |

→ Operations on Rows and Columns in DataFrames:

→ Adding a New Column to a DataFrame: We can easily add a new column to a DataFrame.

→ Adding a New Row to a DataFrame: We can add a new row to a DataFrame using the DataFrame.loc[] method.

→ Deleting Rows or Columns from a DataFrame: We can use the DataFrame.dropO method to delete rows and columns From a DataFrame.

→ Renaming Row Labels of a DataFrame: We can change the labels of rows and columns in a DataFrame using the DataFrame. rename() method.

→ Renaming Column Labels oF a DataFrame: To alter the column names of ResultDF, we can again use the rename() method.

→ Accessing DataFrames Element through Indexing: Data elements in a DataFrame can be accessed using indexing. There are two ways of indexing DataFrames: Label Based Indexing and Boolean Indexing.

→ Label Based Indexing: There are several methods in Pandas to implement label based indexing. DataFrame.loc[] is an important method that is used for label based indexing with DataFrames.

→ Boolean Indexing: In boolean indexing, we can select the subsets of data based on the actual values in the DataFrame rather than their row/column labels. Thus, we can use conditions on column names to filter data values.

→ Accessing DataFrames Element through Slicing: We can use slicing to select a subset of rows and/or columns from a DataFrame. To retrieve a set of rows, slicing can be used with rovy labels.

→ Attributes of DataFrames:

Attribute Name | Purpose |

DataFrame.index | to display row labels |

DataFrame.columns | to display column labels |

DataFrame.dtypes | to display data type of each column in the dataframe |

DataFrame.values | to display a NumPy ndarray having all the values in the dataframe, without the axes labels |

DataFrame.shape | to display a tuple representing the dimensionality of the dataframe |

DataFrame.size | to display a tuple representing the dimensionality of the dataframe |

DataFrame. | to transpose the dataframe, means, row indices and column labels of the dataframe replace each other’s position |

DataFrame.head(n) | to display the first n rows in the dataframe |

DataFrame.tail(n) | to display the last n rows in the dataframe |