A2: Python Essentials (Part-1): Data types

Junaid Qazi, PhD
11 min readJan 29, 2020

This article is a part of “Data Science from Scratch — Can I to I Can” series.

Click here for the previous article/lecture on “A1: course introduction and environment setup”.

We all know that Python is the mainstream language for Data Science. We also know that learning Python for development purpose is very different than learning for Data Science. Without involving you guys into any unnecessary discussion, let’s move on to the primary goal and learn the most important and key concepts of Python that are absolute necessity to do Data Science.

Please note, the video lectures for this article are provided at the end!

This section is carefully designed for all levels. If you don’t have any programming experience in Python, don’t worry, it is still very easy to follow. If you already have a good knowledge of Python and/or working experience in any other language, you can skip this lecture/article or even the full section on Python Essentials . However, a quick review is recommended to brush-up the basics.

For those who want to explore more, following resources are very useful:

python.org
learning-python’s documentation
Learning Python, 5th Ed. by Mark Lutz is a great resource book for beginners.

TIP: Writing comments, while coding, is a very good practice. Comments help a lot when you come back to check your own code and you can easily recall what you have done before. Comments are specially helpful when you are working in a team and need to share the code.

1: Python data types

Let’s start with important Python data types:

  • Numbers
  • Strings & print formatting
  • Lists
  • Dictionaries
  • Tuples
  • Sets
  • Booleans

1.1 Numbers

Python has two basic number types, integer and float. For example, 2 is an integer and 2.0 is a floating point number which has a decimal attached to it. We can perform arithmetic operations to either of these number types.

We can divide them with a forward slash / between them. (note, two integers are giving floating point in results).

If you are using Python 2, you need to add 2.0 in the expression (e.g. 5 / 2.0) to get the same results.

We can compute the power of some number (exponent) with two Asterix ** together.

Python follow the order of arithmetic operations, for example: for 1 + 2 * 3 + 4, Python will first multiply 2 & 3 then perform the other operations.

A good practice is to use parentheses "( )"to tell the Python, which operation needs to be performed and clarify the order.

The operations in the parentheses will be performed first.

Modulus or mod operation is a “%" (percentage sign) in Python. It returns what remains after a division.

With Mod operation, we can check whether the number is even or odd.

  • If mod (%) returns 0, the number is even
  • If mod (%) returns 1, the number is odd.

1.2 Variables

At many occasions, we pick a variable with some name and assign object or a data type to that variable. We can do this in Python using equal sign operator “=”. For example, if we assign a value of 5 to a variable “x”, whenever we call x, this will return 5 in the output.

We can perform arithmetic operations using these created variable.

We can perform any arithmetic operation and assign the result to a new variable.

We can re-assign a value to the variables. This will replace the existing value to its new value.

This is a common practice to use a variable with multiple names. Proper syntax in Python is to separate them with underscore “_” between the words. By doing this, you will easily identify them. e.g. total_profit, total_loss, first_name etc

Important regarding variable name
Before we move on to the next data type, we should know:

In Python,

  • variable names can not be started with number (e.g. 1var, 2x etc)
  • variable names can not be started with special characters (e.g. *var, !y etc)
  • variable names are case sensitive, “Name” is not the same as “name”
  • reserved words can’t be a variable name, e.g. "class" is a reserve word in Python, it can’t be used as a variable name, but "klass" and "Class" work fine.

Let’s try some invalid variable names!

1.3 Strings

Strings are one of the most popular and very useful data types in Python. They can be created by enclosing characters in single ’ ’ or double “ “ quotes. Python considers single and double quotes in the same way. Strings are used to record text information (e.g. person name) as well as arbitrary collection of bytes (e.g. contents of an image file)

We can pass a string to a variable as well.

We can use the print() method to output the variable X. This is a proper and official way to display the results in Python.

Let me introduce a very useful built-in method format() with a use example at this stage.

If we have two variable

  • name = ‘Qazi’
  • number = 30

using format(), we can put the values of name and number at specified locations with the print statement.

Take a string, create place holders using curly brackets "{}" and pass that string to print() with format(),containing values for the place holders. Python reads the curly brackets "{}" in the string and then format() method. It will put the values in the order, we have given in format() method.

This is a convenient way of writing the desired values in the place holder instead of writing those values (Qazi and 30) again and again.

For example:

There is another very useful way to write these values in place holders, where we don’t need to worry about the order of variables in the format method. Let's try this!

We can put more than one placeholder "{}" for a same variable/values.

Little more about strings:

  • A string is a sequence (a positionally ordered collection) of other objects or elements.
  • A string maintains “left-to-right” order among the contained items.
  • Items in a string are stored and fetched by their relative position.
  • Stings are immutable in Python — they cannot be changed in place after they are created. In simple words, immutable objects can never be overwritten, e.g. we can’t change a string by assigning to one of its positions, but we can always build a new string and assign it to the same name/variable.

As, string is a sequence, we can grab any of its letter using its index position in square brackets, right? Let’s try!

TIP: <shift+tab> for the document string (docstring) while working in jupyter notebook! Watch the video lecture!

Some important methods to know:

So, I hope that you got a good understanding of strings. This is a good point to introduce some commonly used string methods such as lower(), upper(), split() etc. Let's see how they work with strings!

s.split() returns a list of the words in a string, using sep (separator) parameter as the delimiter string. If sep is not specified, any whitespace string is a separator and empty strings are removed from the result.

1.4 Lists

As discussed, a string is a sequence of items. Let’s move on with this idea to explore another very useful data type in Python, which is “list”. Some key points for the lists are:

  • Lists are positionally ordered collections of arbitrarily typed objects (they can take any datatype).
  • Lists are a sequence of elements separated by commas in a set of square brackets.
  • Lists have no fixed size.
  • Lists are mutable — unlike strings, lists can be modified.
  • Lists can be indexed and sliced, same as strings.

A list can grow and shrink on demand.

  • We can add an element in the list using "append()" method

"pop()" method removes and returns the last item from the list

We can re-assign position using index. What if we want first element in my_list to be 'hi'. Let’s try!

Nesting in Lists

We can nest a list with-in a list and grab any element from a nested list. This is a very useful feature applied to represent matrices or multidimensional arrays.

1.5 Dictionaries

We have learnt about strings and lists, they are somehow related to each other. They are just sequences of items and we can use same index notation to grab the elements from them.

Let’s talk about dictionaries now.

  • Dictionaries are something completely different as they are not sequences at all.
  • Dictionaries are key-value pairs in curly brackets "{}"
  • Dictionaries store objects by ‘key’ instead of by relative position and don’t maintain any reliable let-to-right order; they simply map key to its associated value.
  • Dictionaries can grow and shrink on demand, Like lists.

We can access any value in the dictionary using its key.

Like in lists and strings, index "dic[0]" is not going to work for dictionaries and raise an error.

Dictionaries can take any item as their values, e.g. we can have a list as value with its associated key in the dictionary.

We can index the list after accessing it with a key in the dictionary.

We can nest dictionary with-in a dictionary

Now we can access any element in the list which is a nested dictionary.

keys() and values() are very useful built-in methods. They return all keys and values from a dictionary.

1.6 Tuples

So, moving forward, let’s talk about Tuples Now!

Tuples are:

  • similar to lists but can not be changed.
  • sequences, like lists, but immutable, like strings.
  • used to represent fixed collection of items.
  • coded in parentheses "()" instead of square brackets

Tuples are immutable and can’t be re-assigned. We can’t do operation like t[0] = 'New' on tuples, such operations are possible in lists.

1.7 Sets
Sets are the next data type that I want to introduce here. A set is a collection of unique elements enclosed in a pair of curly brackets '{}'. A very simple concept from our school maths!

An item appears only once in a set, no matter how many times it is added.
For example, if we put multiples of same element in a set, it will be reduced down to a single unique element is that set.

We can use add() method to add a new element in a set.

If we want to add 5 again, it will not give an error. The output will remain same, as a set can only have unique elements.

1.8 Booleans
Booleans are simply True and False with capital T and capital F — just a customized version of 1 and 0.

Great job! Here are the video lectures for this article!

A quick review on what we learned so far: In this article/lecture, we have gone through the basic Python data types, Numbers, Strings & print formatting, Lists, Dictionaries, Tuples, Sets and Booleans. We have learned these data types with examples. If you are interested to explore more, please read Python documentation or consult any basic book on Python. There are lots of good and free books available on Python.
I hope that you have enjoyed this article/lecture. Let’s move on to the next article/lecture and learn other important concepts for this course.

Please click here for the next article/lecture (A3: Python Essentials Part-2) in this series, thanks!

Note: This complete course is available on Udemy and SkillShare. It is encouraged to acknowledged the efforts and buy this course from these platforms. Stay tuned, a book is also in draft!

About Dr. Junaid Qazi:

Dr. Junaid Qazi is a Subject Matter Specialist, Data Science & Machine Learning Consultant. He is a Professional Development Coach, Mentor, Author, and Invited Speaker. He can be reached for consulting projects and/or professional development training via LinkedIn (https://www.linkedin.com/in/jqazi/) or through his company website (www.scienceacademy.ca)

--

--

Junaid Qazi, PhD

We offer professional development, corporate training, consulting, curriculum and content development in Data Science, Machine Learning and Blockchain.