Background
Objectives
Importing the necessary packages
Data Loading & Overview
Exploring The Data
Conclusions
Background
Honeybee Population Declining
In 2006, the decline in the honeybee population was becoming a concern, as honeybees have an integral place in American honey agriculture.

A Recognized Disorder

Large numbers of hives were lost to Colony Collapse Disorder, a phenomenon of disappearing worker bees that causes the remaining hive colonies to collapse. Speculation on the cause of this disorder points to hive diseases and pesticides harming the pollinators, tho no overall consensus has been reached. The U.S. previously produced more than half the honey it consumed per year. Since then, honey has become primarily imported, with 350 of the 400 million pounds of honey consumed every year originating from imports.

Investigating The Data

This dataset provides insight into honey production supply and demand in America from 1998 to 2016.

Objectives

To visualize how honey production has changed over the years (1998–2016) in the United States.

Key questions to be answered:

How has honey production yield changed from 1998 to 2016?
Over time, what have been the major production trends across the states?
Are there any pattern that can be observed between total honey production and the value of production every year? How has the value of production, which in some sense could be tied to demand, changed every year?

Importing the necessary packages

In [1]:

# NOTE: for codelab cloud env
# !pip install numpy==1.25.2 pandas==1.5.3 matplotlib==3.7.1 seaborn==0.13.1 -q --user

In [2]:

# Libraries to help with reading and manipulating data
import numpy as np
import pandas as pd

# Libraries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Command to tell Python to actually display the graphs
%matplotlib inline

# To supress numerical display in scientific notations
pd.set_option('display.float_format', lambda x: '%.2f' % x)

Data Loading & Overview

In [3]:

honeyprod = pd.read_csv("honeyproduction1998-2016.csv")

In [4]:

honeyprod.head()

Out [4]:

	state	numcol	yieldpercol	totalprod	stocks	priceperlb	prodvalue	year
0	Alabama	16000.00	71	1136000.00	159000.00	0.72	818000.00	1998
1	Arizona	55000.00	60	3300000.00	1485000.00	0.64	2112000.00	1998
2	Arkansas	53000.00	65	3445000.00	1688000.00	0.59	2033000.00	1998
3	California	450000.00	83	37350000.00	12326000.00	0.62	23157000.00	1998
4	Colorado	27000.00	72	1944000.00	1594000.00	0.70	1361000.00	1998

State: Various states in the U.S.
year: Year of production
stocks: Refers to stocks held by producers. Unit is pounds
numcol: Number of honey-producing colonies. Honey producing colonies are the maximum number of colonies from which honey was taken during the year. It is possible to take honey from colonies that did not survive the entire year
yieldpercol: honey yield per colony. The unit is in pounds
totalprod: Total production (numcol x yieldpercol). Unit is pounds
priceperlb: Refers to average price per pound based on expanded sales. The unit is dollars.
prodvalue: Value of production (totalprod x priceperlb). The unit is dollars.

Dataset Shape

In [5]:

dataShape = honeyprod.shape
print(f'rows: {dataShape[0]}\ncols: {dataShape[1]}')

rows: 785
cols: 8

Column DataTypes

In [6]:

honeyprod.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 785 entries, 0 to 784
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   state        785 non-null    object 
 1   numcol       785 non-null    float64
 2   yieldpercol  785 non-null    int64  
 3   totalprod    785 non-null    float64
 4   stocks       785 non-null    float64
 5   priceperlb   785 non-null    float64
 6   prodvalue    785 non-null    float64
 7   year         785 non-null    int64  
dtypes: float64(5), int64(2), object(1)
memory usage: 49.2+ KB

There is only one object datatype column with 7 numerical datatypes
All the columns have 785 observations, which means none of the columns has null values

Statistical summary

In [7]:

honeyprod.describe()

Out [7]:

	numcol	yieldpercol	totalprod	stocks	priceperlb	prodvalue	year
count	785.00	785.00	785.00	785.00	785.00	785.00	785.00
mean	61686.62	60.58	4140956.69	1257629.30	1.70	5489738.85	2006.82
std	92748.94	19.43	6884593.86	2211793.82	0.93	9425393.88	5.49
min	2000.00	19.00	84000.00	8000.00	0.49	162000.00	1998.00
25%	9000.00	46.00	470000.00	119000.00	1.05	901000.00	2002.00
50%	26000.00	58.00	1500000.00	391000.00	1.48	2112000.00	2007.00
75%	65000.00	72.00	4096000.00	1380000.00	2.04	5559000.00	2012.00
max	510000.00	136.00	46410000.00	13800000.00	7.09	83859000.00	2016.00

Number of colonies in every state are spread over a huge range. Ranging from 2000 to 510000
The average number of colonies is close to the 75% percentile of the data, indicating a right skew
The standard deviation of numcol columns is very high*

Exploring The Data

Count of Colonies

In [8]:

plt.figure(figsize=(15, 7))
sns.histplot(data= honeyprod, x= "numcol", kde= True);

In [9]:

sns.boxplot(data = honeyprod, x = 'numcol');

	numcol	yieldpercol	totalprod	stocks	priceperlb	prodvalue
numcol	1.00	0.22	0.95	0.82	-0.21	0.90
yieldpercol	0.22	1.00	0.38	0.36	-0.36	0.26
totalprod	0.95	0.38	1.00	0.88	-0.24	0.90
stocks	0.82	0.36	0.88	1.00	-0.28	0.71
priceperlb	-0.21	-0.36	-0.24	-0.28	1.00	-0.06
prodvalue	0.90	0.26	0.90	0.71	-0.06	1.00

	state	totalprod
0	North Dakota	624435000.00
1	California	390315000.00
2	South Dakota	344361000.00
3	Florida	297798000.00
4	Montana	210125000.00
5	Minnesota	175432000.00
6	Texas	137832000.00
7	Wisconsin	95067000.00
8	Michigan	93788000.00
9	Idaho	78362000.00

	state	totalprod
34	New Mexico	7147000.00
35	Vermont	6720000.00
36	West Virginia	5615000.00
37	Maine	5256000.00
38	Virginia	4837000.00
39	Nevada	4832000.00
40	Kentucky	4263000.00
41	South Carolina	3174000.00
42	Maryland	1266000.00
43	Oklahoma	1207000.00

	state	priceperlb
0	Virginia	55.36
1	Illinois	50.47
2	North Carolina	47.56
3	Kentucky	46.51
4	Tennessee	44.64
5	West Virginia	43.61
6	New Jersey	41.25
7	Vermont	40.43
8	Maine	38.42
9	Ohio	38.26

	state	priceperlb
34	South Dakota	24.80
35	North Dakota	24.56
36	Nevada	24.41
37	Arkansas	24.35
38	Mississippi	23.88
39	Louisiana	23.77
40	New Mexico	19.77
41	South Carolina	16.57
42	Maryland	9.37
43	Oklahoma	8.74

Background

Honeybee Population Declining

Count of Colonies