A13: Pandas (Practice Exercises >> 2: City of Chicago Payroll Data)
This article is a part of “Data Science from Scratch — Can I to I Can” A Lecture Notes Book Series.
✅ A Suggestion: Open a new jupyter notebook and type the code while reading this article, doing is learning, and yes, “PLEASE Read the comment, they are very useful…..!”
Let’s explore real payroll data of Chicago city. This data is available on kaggle, however, a copy is included on the provided link as well. Please follow the instructions and answer the questions accordingly. Good luck!
These are links to few similar datasets for your self-learning.
☞ City payroll information for all Los Angeles employees since 2013
☞ San Francisco city employee salary data
☞ New York City — Citywide Payroll Data
Feel free to consult the solutions if needed. Please note, the tasks given in the exercises, can be solved in different ways. Try your best answer and compare with the solutions.
Exercises:
(Solutions are provided at the end.)
Here is the link to read the data from github:
# Data link at github
City_of_Chicago_Payroll_Data=“https://raw.githubusercontent.com/junaidqazi/DataSets_Practice_ScienceAcademy/master/City_of_Chicago_Payroll_Data.csv”
- Read City of Chicago Payroll Data in a dataframe
pay
. - Show first five records in your data.
- Get the overview of your data.
- How many ‘NaN’ (missing data) you have in each column of your data? What is the % of missing data in each column?
Hint: useisnull()
- Get the statistics for your dataset. Do you see lot’s of NaN? Why?
- What are the maximum, minimum and average Typical Hours? (use ‘Typical Hours’ column)
Do you remember, pandas ignoreNA/NaN/NULL
values by default?pay['Typical Hours'].dropna().mean() should gives the same mean :) You may want to confirm!
- How many employees are on salary and how many are working on hourly basis?
Hint: Group the data on ‘Salary or Hourly’ column. Instead of whole dataset, you can take the single column as input as well. - Which department has maximum no of employees?
- How many employees are on Salary and how many are on Hourly in the Police department?
- What are the mean, max and min salaries?
use.str and replace()
Create a new column name "Salaries" and separate the number from dollar sign as float .
Hint: pandas.series.str and Stackoverflow link - Find an employee who has the maximum salary.
Don’t use the max number you got above, write a code for general condition.
Try withidxmax()
as well! - Find an employee who has the minimum salary.
Useidxmin()
as well! - What are the mean, max and min Hourly Rate?
use.str and replace()
Hint: Once again, create a new column name"H_Rate"
and separate the number from dollar sign as float . - How many employees are getting max Hourly Rate?
- Who is getting max Hourly Rate?
- How many employees are earning less than the average Hourly Rate?
- How many employees are paid hourly and they have full-time job?
- Find the full-time employees who are working at hourly rate of
$10.00
? - How many unique Job Titles are there in the data?
- What was the top 10 average Salary of the employees in each department?
- What is the job title of
'AGAR, BULENT B'
?
please note, there are two spaces between AGAR, and BULENT. - What are the top most three common job titles?
- How many people have the word ‘officer’ in their job title?
(try writing a function and useapply()
)
Solutions:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
Keep practicing to brush-up and add new skills.
Excellent work!
Your clap and share can help us to reach to someone who is struggling to learn these concepts.
Good luck!
See you in the next lecture on “A14: Matplotlib Essentials”.
Note: This complete course, including video lectures and jupyter notebooks, is available on the following links:
Dr. Junaid Qazi is a Subject Matter Specialist, Data Science & Machine Learning Consultant and a Team Builder. He is a Professional Development Coach, Mentor, Author, and Invited Speaker. He can be reached for consulting projects and/or professional development training via LinkedIn.