# Introduction

When working with data with missing values (aka NA or *not available*), we have to be careful about the operations we do. In this short article, we will look at different NA data types that someone may deal with when working with Pandas or NumPy libraries.

There are different null objects such as **numpy.nan/numpy.NaN** (Not a Number), **pandas.NaT** (Not a Time), or pythonâ€™s **None** type object. Null objects may behave unexpectedly and result in a semantic error (aka logic error) that is not easy to find or debug. Unlike syntax errors, your program will compile successfully even if there are semantic errors.

*In this article, we will go over the following items:*

*Comparison of null objects (â€ś==â€ť vs â€śisâ€ť)**Finding null objects in Pandas & NumPy**Calculations with missing values*

*NOTE: Data imputation/wrangling techniques are not a part of this article (a topic for a future article).*

**Comparing Null Objects (== vs. is )**

When comparing a Python object that may be NA, keep in mind the difference between the two Pythonâ€™s equality operators: â€ś** is**â€ťand â€ś

**â€ť. Pythonâ€™s keyword â€ś**

*==***â€ť compares the**

*is**identities*of two variables

*,*while â€ś

**â€ť compares two variables by checking whether they are equal. Letâ€™s see how these two differ.**

*==*```
None == None
# >>> True
None is None
# >>> True
```

When comparing Pythonâ€™s **None** object, both â€ś**==**â€ť and â€ś**is**â€ť yield the same results. However, the output is different when **numpy.nan** null object is used!

```
== numpy.nan
numpy.nan # >>> False
is numpy.nan
numpy.nan # >>> True
```

This behavior may result in a semantic error, particularly if we do an element-wise comparison. For example, assume that we have

`= [1.0, np.nan, 2.0] data `

And we want to print a message on whether there is a missing value in the **data** or not.

```
# Using "==" in the element-wise comparison
for x in data:
if x == np.nan:
print(f"Using '==' --> {x} is a nan!")
else:
print(f"Using '==' --> {x} is not a nan!")
# Using "is" in the element-wise comparison
for x in data:
if x is np.nan:
print(f"Using 'is' --> {x} is a nan!")
else:
print(f"Using 'is' --> {x} is not a nan!")
```

```
'==' --> 1.0 is not a nan!
Using '==' --> nan is not a nan!
Using '==' --> 2.0 is not a nan!
Using 'is' --> 1.0 is not a nan!
Using 'is' --> nan is a nan!
Using 'is' --> 2.0 is not a nan! Using
```

It is safer to use Pandas and/or NumPyâ€™s built-in methods to check for missing values. We will cover this in the next section.

# Finding null objects in Pandas & NumPy

It is always safer to use NumPy or Pandas built-in methods to check for NAs. In NumPy, we can check for NaN entries by using *numpy.isnan()* method. NumPy only supports its NaN objects and throws an error if we pass other null objects to numpy.*isnan()*.

```
numpy.isnan(np.nan)# >>> True
None)
numpy.isnan(# >>> TypeError
```

I suggest you use *pandas.isna()* or its alias *pandas.isnull()* as they are more versatile than *numpy.isnan()* and accept other data objects and not only *numpy.nan*.

```
# pandas.isna() is an alias of pandas.isnull()
pandas.isna(np.nan)# >>> True
None)
pandas.isna(# >>> True
pandas.isna(pd.NaT)# >>> True
```

**Calculations with missing data**

Let me tell you a story that happened to me a few days ago. I wanted to calculate the Median Absolute Deviation using mad() from the statsmodel library that is dependent on the *median()* function from NumPy. I had NaN entries in the data I was working on, and consequently, the output result was NaN since there was at least one missing value in the input array. It took me some time to find this semantic error. So, I figured the following out in a hard way:

The following examples illustrate what happens when we calculate some statistics from our data without considering the missing values:

```
2 + numpy.nan
# >>> nan
/ 2
numpy.nan # >>> nan
```

You have to be cautious about NaNs in your data when you are calculating any statistic. For example, letâ€™s calculate the mean of an array including a NaN.

```
1.0, 2.0, 3.0, numpy.NaN])
numpy.mean([# >>> nan
1.0, 2.0, 3.0, numpy.NaN])
numpy.nanmean([# >>> 2.0
```

NumPy functions that calculate data statistics usually have counterpart functions to work with NaNs such as numpy.nansum() and numpy.nanstd().

# Recommendations

- Always keep in mind the difference between equality operators â€ś
â€ť and â€ś*==*â€ť.*is* - Use
*Pandas*built-in methods to check for NA entries. - Pay attention to the behavior of functions in the presence of null objects, particularly functions to calculate statistical properties.

# Conclusion

I believe next time you work with null objects in Python, you pay more attention to them. I hope you learned something useful from my first ever article on Medium.com. Feel free to provide me with any feedback or suggestion.

# Useful Links

Working with missing data - pandas 1.2.3 documentation

The Difference Between â€śisâ€ť and â€ś==â€ť in Python - dbader.org

## Citation

```
@online{alizadeh2019,
author = {Esmaeil Alizadeh},
editor = {},
title = {Working with {Missing} {Values} in {Pandas} and {NumPy}},
date = {2019-11-10},
url = {https://ealizadeh.com/blog/working-with-missing-values-in-pandas-and-numpy},
langid = {en}
}
```