Lake Mead’s Water Crisis: Drought, Climate Change, and Innovative Solutions

32 minute read

Published: September 29, 2024

This is a blog post for the HiMCM 2021 Problem B. The blog post will cover the Lake Mead water crisis, including the factors affecting water volume, modeling water levels using SARIMAX and structural time series models, and developing innovative solutions to address the drought problem. The analysis will involve importing data, exploratory data analysis, time series analysis, and forecasting future water levels using mathematical models.

For me, I never think that I will keep writing blog posts every week since the workload is getting heavier and heavier. At first, I thought that I will only write blog posts unfrequently, but for the whole September, I have been writing blog posts every week. I am glad that I can keep up with the schedule and I hope that I can keep up with the schedule for the next month as well.

For my readers, I hope that you can enjoy reading my blog posts and truly understand the content that I am writing. I hope that my blog posts can be inspirational for everyone who is interested in science and mathematics. though I am not a professional in this field, I hope that I can provide some insights and knowledge to my readers.

Problem Background

Lake Mead, the largest water reservoir in the United States, is facing an unprecedented crisis. Located on the Nevada-Arizona border, this crucial Colorado River reservoir has reached its lowest level since its creation in the 1930s. In the summer of 2021, Lake Mead’s water level dropped to just 36% of its full capacity.

This alarming situation is the result of several factors:

Prolonged drought exacerbated by climate change
Increasing water demand from the 25 million people Lake Mead serves
Reduced inflow from the Colorado River
Continued water consumption and evaporation

The severity of the situation led the Bureau of Reclamation to declare the first-ever water shortage on the Colorado River in August 2021. This Tier 1 water shortage declaration results in reduced water deliveries to Arizona, Nevada, and New Mexico, with agricultural communities being the first to feel the impacts.

As researchers and policymakers grapple with this crisis, they’re exploring multiple avenues, including analyzing historical water level data, developing predictive models, and investigating potential solutions like wastewater recycling.

Analysis

To address the Lake Mead drought problem, we need to develop mathematical models that can predict future water levels and assess the potential impact of solutions like wastewater recycling. Our analysis will focus on two key aspects:

Factors Affecting Lake Mead’s Water Volume:

The volume of water in Lake Mead is influenced by:

Inflow: Primarily from the Colorado River (96%) and other tributaries
Outflow: Water releases through dams and direct consumption
Loss: Evaporation from the lake surface

One may express the relationship mathematically as follows:

$\Delta V = \text{Inflow} - \text{Outflow} - \text{Evaporation}$

Inflow can be modeled using historical data and climate forecasts.
Outflow is determined by water demand, environmental regulations, and hydroelectric power generation.
Evaporation rates can be estimated based on historical weather data and lake surface area.
The change in volume ($\Delta V$) over time will give us insights into the reservoir’s water level trends.
We can use time series analysis and predictive modeling to forecast future water levels under different scenarios.

Modeling Water Levels:

We can develop two mathematical models to predict Lake Mead’s water levels:

SARIMAX Model: Seasonal Autoregressive Integrated Moving Average with Exogenous Variables (SARIMAX) is a time series model that incorporates seasonal patterns, autocorrelation, and external factors like climate data. By fitting a SARIMAX model to historical water level data, we can forecast future water levels and assess the impact of inflow, outflow, and evaporation on the reservoir.
Structural Time Series: Structural time series models decompose the observed data into trend, seasonal, and error components. By analyzing these components, we can identify patterns and make predictions about future water levels.

Using SARIMAX model for time series analysis problem, it can capture both trend and seasonality in the data. The model can be used to forecast future water levels based on historical data and external factors like climate forecasts. By analyzing the model’s parameters and residuals, we can evaluate its accuracy and make informed decisions about water management strategies.

The general form of a SARIMAX model is given by:

\[\text{SARIMAX}(p, d, q) \times (P, D, Q)_m\]

where:

$p$ is the autoregressive order: the number of lagged observations included in the model
$d$ is the differencing order: the number of times the data is differenced to achieve stationarity
$q$ is the moving average order: the number of lagged forecast errors included in the model
$P, D, Q$ are the seasonal autoregressive, differencing, and moving average orders
$m$ is the seasonal period (e.g., 12 for monthly data)

While on the other hand, the structural time series model can be expressed as:

\[y_t = \mu_t + \gamma_t + \varepsilon_t\]

where:

$y_t$ is the observed data at time $t$
$\mu_t$ is the trend component: captures the long-term changes in the data
$\gamma_t$ is the seasonal component: captures the periodic fluctuations
$\varepsilon_t$ is the error term: captures the random fluctuations
By estimating the parameters of the trend and seasonal components, we can forecast future water levels and assess the impact of different factors on the reservoir.

Finally, we establish an ensemble model that combines the predictions from both SARIMAX and structural time series models to improve the accuracy and robustness of our forecasts. By leveraging the strengths of each model and accounting for their respective weaknesses, we can generate more reliable predictions and make informed decisions about water management strategies.

Solution

Now, we will outline a comprehensive solution to the Lake Mead water crisis, incorporating mathematical modeling, data analysis, and innovative solutions:

Step 1: Importing Libraries and Data

We first import the necessary Python libraries for time series analysis and data visualization. We then load historical water level data for Lake Mead to analyze trends and patterns.

# Import the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.statespace.structural import UnobservedComponents
from statsmodels.tsa.seasonal import seasonal_decompose
from pmdarima import auto_arima
from statsmodels.tsa.stattools import adfuller
import warnings

# Setting up Matplotlib
plt.rcParams['figure.figsize'] = (16, 14)
plt.rcParams['font.size'] = 16
plt.rcParams['axes.labelsize'] = 20
plt.rcParams['axes.labelweight'] = 'bold'
plt.rcParams['xtick.labelsize'] = 16
plt.rcParams['ytick.labelsize'] = 16
plt.rcParams['legend.fontsize'] = 16
plt.rcParams['figure.titlesize'] = 20

# Warning settings
warnings.filterwarnings("ignore")

Then we first load the monthly elevation data for Lake Mead from 1935 to 2021 and visualize the time series to identify trends and patterns.

# Load the data
df_monthly = pd.read_excel('2021_HiMCM_LakeMead_MonthlyElevationData.xlsx')
df_monthly.head()

Step 2: Exploratory Data Analysis

We first preprocess the data since the whole excel file is in chaos. We then visualize the monthly water level data to identify trends, seasonality, and any irregular patterns.

# Drop the last two columns
df_monthly = df_monthly.drop(columns=['Unnamed: 13', 'Unnamed: 14'])
df_monthly.head()

# Change the column names
df_monthly.columns = ['Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df_monthly.head()

# Drop the first row
df_monthly = df_monthly.drop([0])
df_monthly.head()

# Since we have '---' in the data set, we need to replace them with NaN
df_monthly = pd.DataFrame(df_monthly.replace('---', np.nan))
df_monthly.head()

Then we melt the data to convert it into a long format for better visualization and analysis.

# Melt the monthly data for easier analysis
monthly_melted = df_monthly.melt(id_vars=['Year'], var_name='Month', value_name='Elevation')
monthly_melted['Date'] = pd.to_datetime(monthly_melted['Year'].astype(str) + '-' + monthly_melted['Month'], format='%Y-%b')
monthly_melted = monthly_melted.sort_values('Date')

We then implement the long-term trend analysis by visualizing the monthly water levels over time.

# Long-term Trend Analysis
plt.figure(figsize=(15, 7))
plt.plot(monthly_melted['Date'], monthly_melted['Elevation'], label='Monthly Elevation')
plt.title('Lake Mead Water Elevation (1935-2021)')
plt.xlabel('Year')
plt.ylabel('Elevation (feet above sea level)')
plt.legend()
plt.grid(True)
plt.savefig('Lake_Mead_Water_Elevation.png')
plt.show()

# Calculate and plot 5-year moving average
monthly_melted.set_index('Date', inplace=True)
rolling_avg = monthly_melted['Elevation'].rolling(window=60).mean()
plt.figure(figsize=(15, 7))
plt.plot(monthly_melted.index, monthly_melted['Elevation'], label='Monthly Elevation', alpha=0.5)
plt.plot(rolling_avg.index, rolling_avg, label='5-Year Moving Average', linewidth=2, color='red')
plt.title('Lake Mead Water Elevation with 5-Year Moving Average')
plt.xlabel('Year')
plt.ylabel('Elevation (feet above sea level)')
plt.legend()
plt.grid(True)
plt.savefig('Lake_Mead_Water_Elevation_5Year_Moving_Avg.png')
plt.show()

The output of the code will be two plots showing the monthly water levels of Lake Mead from 1935 to 2021 and the 5-year moving average of the water levels.

Lake Mead Water Elevation

Lake Mead Water Elevation with 5-Year Moving Average

We then load the low and high elevation data for Lake Mead from 1935 to 2021 and visualize the time series to identify trends and patterns.

# Load the data
df_annual = pd.read_excel('2021_HiMCM_LakeMead_LowHighElevationData.xlsx')
df_annual.head()

# Drop the last three columns
df_annual = df_annual.drop(columns=['Unnamed: 7', 'Unnamed: 8', 'Unnamed: 9'])

# Change the column names
df_annual.columns = ['Year', 'Date_Low', 'Time', 'Low', 'Date_High', 'Time', 'High']
df_annual.head()

# Drop the first row
df_annual = df_annual.drop([0])
df_annual.head()

# Convert date to datetime
df_annual['Date_Low'] = pd.to_datetime(df_annual['Date_Low'])
df_annual['Date_High'] = pd.to_datetime(df_annual['Date_High'])
df_annual.head()

Like before, we have to preprocess the data and visualize the annual low and high water levels of Lake Mead over time.

# Convert the data to float
df_annual['Year'] = df_annual['Year'].astype(float)
df_annual['Low'] = df_annual['Low'].astype(float)
df_annual['High'] = df_annual['High'].astype(float)

Then we visualize the annual low and high water levels of Lake Mead over time.

# Annual High and Low Water Levels Over Time
plt.figure()
plt.plot(df_annual['Year'], df_annual['Low'], label='Low Elevation', color='red')
plt.plot(df_annual['Year'], df_annual['High'], label='High Elevation', color='blue')
plt.fill_between(df_annual['Year'], df_annual['High'], df_annual['Low'], alpha=0.3)
plt.title('Annual High and Low Water Elevations in Lake Mead (1935-2021)')
plt.xlabel('Year')
plt.ylabel('Elevation (feet above sea level)')
plt.legend()
plt.grid(True)
plt.savefig('Lake_Mead_Annual_High_Low_Elevation.png')
plt.show()

Annual High and Low Water Elevations in Lake Mead

Thereafter, we implement the water level range analysis by visualizing the annual water level range of Lake Mead over time.

# Water Level Range Analysis
df_annual['Level_Range'] = df_annual['High'] - df_annual['Low']

plt.figure()
plt.plot(df_annual['Year'], df_annual['Level_Range'], color='green')
plt.title('Annual Water Level Range in Lake Mead (1935-2021)')
plt.xlabel('Year')
plt.ylabel('Water Level Range (feet)')
plt.grid(True)
plt.savefig('Lake_Mead_Annual_Water_Level_Range.png')
plt.show()

print('The average water level range is', df_annual['Level_Range'].mean(), 'feet.')
print('The maximum water level range is', df_annual['Level_Range'].max(), 'feet.')
print('The minimum water level range is', df_annual['Level_Range'].min(), 'feet.')

The output of the code will be a plot showing the annual water level range of Lake Mead from 1935 to 2021.

Annual Water Level Range in Lake Mead

Also the output for the average, maximum, and minimum water level range will be displayed.

The average water level range is 27.16883720930232 feet.
The maximum water level range is 254.95000000000005 feet.
The minimum water level range is 3.7899999999999636 feet.

We then have the seasonal pattern for high and low water levels visualized by plotting the monthly water levels of Lake Mead over time.

# Seasonal Pattern of High and Low Water Levels
plt.figure()
plt.scatter(df_annual['Date_Low'].dt.month, df_annual['Year'], label='Annual Low', alpha=0.5, color='red')
plt.scatter(df_annual['Date_High'].dt.month, df_annual['Year'], label='Annual High', alpha=0.5, color='blue')
plt.title('Seasonal Pattern of High and Low Water Elevations in Lake Mead')
plt.xlabel('Month')
plt.ylabel('Year')
plt.legend()
plt.grid(True)
plt.savefig('Lake_Mead_Seasonal_Pattern.png')
plt.show()

print('Most Common Month for Low Water Levels:', df_annual['Date_Low'].dt.month.mode().values[0])
print('Most Common Month for High Water Levels:', df_annual['Date_High'].dt.month.mode().values[0])

Seasonal Pattern of High and Low Water Elevations in Lake Mead

The output of the code will display the most common month for low and high water levels in Lake Mead.

Most Common Month for Low Water Levels: 4
Most Common Month for High Water Levels: 7

Last but not least, we have the drought period analysis by visualizing the drought periods in Lake Mead based on the annual water levels.

# Drought Period Analysis
overall_mean_low = df_annual['Low'].mean()

# Plot the data
plt.figure()
plt.plot(df_annual['Year'], df_annual['Low'], label='Annual Low Elevation', color='red')
plt.axhline(y=overall_mean_low, color='red', linestyle='--', label='Overall Low Elevation Mean')
plt.fill_between(df_annual['Year'], df_annual['High'], df_annual['Low'], alpha=0.3)
plt.title('Annual High and Low Water Elevations in Lake Mead (1935-2021)')
plt.xlabel('Year')
plt.ylabel('Elevation (feet above sea level)')
plt.legend()
plt.grid(True)
plt.savefig('Lake_Mead_Annual_High_Low_Elevation_Drought.png')
plt.show()

Annual High and Low Water Elevations in Lake Mead during Drought Periods

Not only visualization, we also have to identify the drought periods in Lake Mead based on the annual water levels.

# Identify drought periods
df_annual['is_drought'] = df_annual['Low'] < overall_mean_low
df_annual['drought_start'] =(df_annual['is_drought'] & ~df_annual['is_drought'].shift(1).fillna(False)).infer_objects(copy=False)
df_annual['drought_end'] = (df_annual['is_drought'] & ~df_annual['is_drought'].shift(-1).fillna(False)).infer_objects(copy=False)

drought_starts = df_annual[df_annual['drought_start']]['Year']
drought_ends = df_annual[df_annual['drought_end']]['Year']

print("Identified Drought Periods:")
for start, end in zip(pd.to_datetime(drought_starts, format='%Y'), pd.to_datetime(drought_ends, format='%Y')):
    duration = end.year - start.year + 1
    print(f"Start: {start.year}, End: {end.year}, Duration: {duration} years")

# Calculate the duration of the most recent drought
drought_starts = pd.to_datetime(drought_starts, format='%Y')
drought_ends = pd.to_datetime(drought_ends, format='%Y')

if len(drought_starts) > 0 and len(drought_ends) > 0:
    most_recent_start = drought_starts.iloc[-1]
    most_recent_end = drought_ends.iloc[-1] if drought_ends.iloc[-1] > most_recent_start else pd.Timestamp.now()
    most_recent_duration = most_recent_end.year - most_recent_start.year + 1
    print(f"\nMost recent drought period:")
    print(f"Start: {most_recent_start.year}, End: {most_recent_end.year}, Duration: {most_recent_duration} years")

# Compare the most recent drought to earlier ones
if len(drought_starts) > 1:
    earlier_droughts = [end.year - start.year + 1 for start, end in zip(drought_starts[:-1], drought_ends[:-1])]
    avg_earlier_duration = np.mean(earlier_droughts)
    print(f"\nAverage duration of earlier droughts: {avg_earlier_duration:.1f} years")
    print(f"The most recent drought is {most_recent_duration / avg_earlier_duration:.1f} times longer than the average of earlier droughts.")

We therefore have the identified drought periods in Lake Mead based on the annual water levels displayed.

Identified Drought Periods:
Start: 1935, End: 1938, Duration: 4 years
Start: 1947, End: 1947, Duration: 1 years
Start: 1952, End: 1952, Duration: 1 years
Start: 1954, End: 1957, Duration: 4 years
Start: 1963, End: 1968, Duration: 6 years
Start: 2004, End: 2020, Duration: 17 years

Most recent drought period:
Start: 2004, End: 2020, Duration: 17 years

Average duration of earlier droughts: 3.2 years
The most recent drought is 5.3 times longer than the average of earlier droughts.

Step 3: Time Series Analysis

Now is the time to implement the SARIMAX model for time series analysis of Lake Mead’s water levels. We first define the stationarity test function to check the stationarity of the time series data.

# Check for stationarity
def test_stationarity(timeseries):
    # Original ADF test
    result = adfuller(timeseries, autolag='AIC')
    print('ADF Statistic:', result[0])
    print('p-value:', result[1])
    print('Critical Values:', result[4])
    
    if result[1] <= 0.05:
        print("Data is stationary")
        return
    
    print("Data is non-stationary. Analyzing type of non-stationarity:")
    
    # Check for trend
    trend = np.polyfit(range(len(timeseries)), timeseries, 1)[0]
    if abs(trend) > 0.1:  # You may need to adjust this threshold
        print("- Trend non-stationarity detected")
        print("  Suggestion: Try first-order differencing")
        diff = timeseries.diff().dropna()
        test_stationarity(diff)
        return
    
    # Check for changing variance
    rolling_std = timeseries.rolling(window=12).std()
    if rolling_std.max() / rolling_std.min() > 1.5:  # You may need to adjust this threshold
        print("- Variance non-stationarity detected")
        print("  Suggestion: Try log transformation")
        if (timeseries > 0).all():
            log_data = np.log(timeseries)
            test_stationarity(log_data)
        else:
            print("  Cannot apply log transformation due to non-positive values")
        return
    
    # Check for seasonality
    try:
        result = seasonal_decompose(timeseries, model='additive', period=12)
        seasonal_strength = np.max(np.abs(result.seasonal)) / np.std(timeseries)
        if seasonal_strength > 0.1:  # You may need to adjust this threshold
            print("- Seasonal non-stationarity detected")
            print("  Suggestion: Try seasonal differencing")
            seasonal_diff = timeseries.diff(12).dropna()
            test_stationarity(seasonal_diff)
            return
    except:
        print("Could not perform seasonal decomposition")
    
    print("No specific type of non-stationarity detected. Consider other transformations or models.")

# Visualize the time series
def plot_timeseries(timeseries):
    name = timeseries.name if hasattr(timeseries, 'name') else 'Time Series'
    plt.figure(figsize=(12,6))
    plt.plot(timeseries)
    plt.title('Time Series Plot')
    plt.xlabel('Time')
    plt.ylabel('Value')
    plt.grid(True)
    plt.savefig(f"{name}_Time_Series_Plot.png")
    plt.show()

We then take the period from 2000 onwards for the time series analysis

# Identify the most recent drought period (e.g., from 2000 onwards)
recent_drought = monthly_melted.loc['2000':].copy()
recent_drought = recent_drought.reset_index()

# Check for nan or inf values and replace them with interpolated values
recent_drought['Elevation'] = recent_drought['Elevation'].replace([np.inf, -np.inf], np.nan).interpolate()

We then test the stationarity of the time series data and visualize the time series plot.

# Test for stationarity
test_stationarity(recent_drought['Elevation'])

# Plot the time series
plot_timeseries(recent_drought['Elevation'])

However, the output is not satisfactory, so we have to apply first-order differencing to make the data stationary.

ADF Statistic: -2.2489728081746634
p-value: 0.18895339766918978
Critical Values: {'1%': -3.4569962781990573, '5%': -2.8732659015936024, '10%': -2.573018897632674}
Data is non-stationary. Analyzing type of non-stationarity:
- Trend non-stationarity detected
  Suggestion: Try first-order differencing
ADF Statistic: -4.349359625233902
p-value: 0.0003644583800686272
Critical Values: {'1%': -3.4569962781990573, '5%': -2.8732659015936024, '10%': -2.573018897632674}
Data is stationary

Time Series Plot

We then apply the 1st order differencing to the time series data and retest for stationarity.

# First-order differencing
diff = recent_drought['Elevation'].diff().dropna()
test_stationarity(diff)

# Save back to the dataframe
recent_drought['Elevation_diff'] = diff

# Drop the first row
recent_drought = recent_drought.drop([0])

# Plot the time series
plot_timeseries(recent_drought['Elevation_diff'])

Now, the output is much more solid and we can proceed with the SARIMAX model.

ADF Statistic: -4.349359625233902
p-value: 0.0003644583800686272
Critical Values: {'1%': -3.4569962781990573, '5%': -2.8732659015936024, '10%': -2.573018897632674}
Data is stationary

Time Series Plot after 1st Order Differencing

We then combine the data of the same year and calculate the mean for numeric columns only.

# Combine the data of the same year and calculate the mean for numeric columns only
annual_recent_drought = recent_drought.groupby(recent_drought['Date'].dt.year).mean(numeric_only=True)
annual_recent_drought['Year'] = pd.to_datetime(annual_recent_drought['Year'], format='%Y')
annual_recent_drought.head()

After that, we decompose the time series data into trend, seasonal, and residual components.

# Decompose the time series
result = seasonal_decompose(recent_drought['Elevation_diff'], model='additive', period=12)

We then visualize the decomposed time series data to identify the trend, seasonal, and residual components.

# Plot the decomposition
plt.figure(figsize=(18, 12))
plt.title('Decomposition of the Time Series')
plt.subplot(411)
plt.plot(recent_drought['Date'], recent_drought['Elevation'], label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(recent_drought['Date'], result.trend, label='Trend')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(recent_drought['Date'], result.seasonal, label='Seasonal')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(recent_drought['Date'], result.resid, label='Residual')
plt.legend(loc='best')
plt.tight_layout()
plt.savefig('Decomposition_of_Time_Series.png')
plt.show()

Decomposition of the Time Series

Since then, we apply the auto ARIMA model to determine the optimal parameters for the SARIMAX model.

# Auto ARIMA
model = auto_arima(annual_recent_drought['Elevation_diff'], seasonal=True, m=12, stepwise=False, trace=True)
model.summary()
best_params = model.get_params()
print(best_params)
best_model = SARIMAX(annual_recent_drought['Elevation_diff'], order=best_params['order'], seasonal_order=best_params['seasonal_order'])
best_fit = best_model.fit()
best_fit.summary()

The output is the optimal parameters for the SARIMAX model.

Best model:  ARIMA(1,0,2)(0,1,0)[12] intercept
Total fit time: 25.663 seconds
{'maxiter': 50, 'method': 'lbfgs', 'order': (1, 0, 2), 'out_of_sample_size': 0, 'scoring': 'mse', 'scoring_args': {}, 'seasonal_order': (0, 1, 0, 12), 'start_params': None, 'suppress_warnings': True, 'trend': None, 'with_intercept': True}

We then set up the forecast period and predict future water levels using the SARIMAX model.

# Get forecast for the next 30 years
forecast_steps = 30
forecast = best_fit.get_forecast(steps=forecast_steps)
forecast_mean = forecast.predicted_mean
forecast_ci = forecast.conf_int()

# Get the last year from your data
last_year = int(annual_recent_drought.index.max())

# Create forecast index starting from the year after the last year in your data
forecast_index = pd.date_range(start=str(last_year + 1), periods=forecast_steps, freq='YE')

Visualizing the forecasted water levels and confidence intervals for the next 30 years. In my opinion, since SARIMAX model is a time series model, it is more comprehensive if we plot the confidence interval as well.

# Plot the forecast for 1st oder differenced data
plt.figure()
plt.plot(annual_recent_drought['Year'], annual_recent_drought['Elevation_diff'], label='Observed', color='blue')
plt.plot(forecast_index, forecast_mean, label='Forecast', color='red')
plt.fill_between(forecast_index, forecast_ci.iloc[:, 0], forecast_ci.iloc[:, 1], color='green', alpha=0.2, label = 'Confidence Interval')
plt.title('Forecast of Lake Mead Water Elevation (1st Order Differenced Data)')
plt.xlabel('Year')
plt.ylabel('Elevation (feet above sea level)')
plt.legend()
plt.grid(True)
plt.savefig('Forecast_Lake_Mead_Water_Elevation_1st_Order_Differenced_Data_SARIMAX.png')
plt.show()

Forecast of Lake Mead Water Elevation (1st Order Differenced Data)

We then invert the differencing to obtain the forecasted water levels in the original scale.

# Get the last observed value from your original (non-differenced) data
last_observed_value = recent_drought['Elevation'].iloc[-1]

# Invert the forecasted values with the first order differencing
forecasted_diff = forecast_mean
forecast_ci_diff = forecast_ci
forecasted_values = np.r_[last_observed_value, forecasted_diff].cumsum().astype(float)[1:]

# Separate cumulative sum for lower and upper bounds
forecasted_ci_lower = np.r_[last_observed_value, forecast_ci_diff.iloc[:, 0]].cumsum().astype(float)[1:]
forecasted_ci_upper = np.r_[last_observed_value, forecast_ci_diff.iloc[:, 1]].cumsum().astype(float)[1:]

# Create a forecasted dataframe with separate CI columns for the next 40 years
forecasted_df = pd.DataFrame({
    'Year': forecast_index,
    'Elevation': forecasted_values,
    'Elevation_CI_Lower': forecasted_ci_lower,
    'Elevation_CI_Upper': forecasted_ci_upper
})

forecasted_df['Year'] = pd.to_datetime(forecasted_df['Year'], format='%Y')

forecasted_df.head()

Visualizing the forecasted water levels and confidence intervals for the next 30 years in the original scale.

# Plot the forecast for the original data
plt.figure()
plt.plot(annual_recent_drought['Year'], annual_recent_drought['Elevation'], label='Observed', color='blue')
plt.plot(forecasted_df['Year'], forecasted_df['Elevation'], label='Forecast', color='red')
plt.fill_between(forecasted_df['Year'], forecasted_df['Elevation_CI_Lower'], forecasted_df['Elevation_CI_Upper'], color='green', alpha=0.2, label='Confidence Interval')
plt.title('Forecast of Lake Mead Water Elevation')
plt.xlabel('Year')
plt.ylabel('Elevation (feet above sea level)')
plt.legend()
plt.grid(True)
plt.savefig('Forecast_Lake_Mead_Water_Elevation_SARIMAX.png')
plt.show()

Forecast of Lake Mead Water Elevation

Step 4: Structural Time Series Analysis

We use the structural time series model to decompose the observed data into trend, seasonal, and error components.

# Using another model to forecast the water elevation
## Unobserved Components Model
### Find the best parameters
model_ucm = UnobservedComponents(annual_recent_drought['Elevation_diff'], 'local linear trend', seasonal=12)
fit_ucm = model_ucm.fit()
fit_ucm.summary()

# Forecast the next 30 years
forecast_ucm = fit_ucm.get_forecast(steps=forecast_steps)
forecast_mean_ucm = forecast_ucm.predicted_mean
forecast_ci_ucm = forecast_ucm.conf_int()

# Get the last year from your data
last_year = int(annual_recent_drought.index.max())

# Plot the forecast for the 1st order differenced data
plt.figure()
plt.plot(annual_recent_drought['Year'], annual_recent_drought['Elevation_diff'], label='Observed', color='blue')
plt.plot(forecast_index, forecast_mean_ucm, label='Forecast_UCM', color='red')
plt.plot(forecast_index, forecast_mean, label='Forecast_SARIMAX', color='purple')
plt.fill_between(forecast_index, forecast_ci_ucm.iloc[:, 0], forecast_ci_ucm.iloc[:, 1], color='green', alpha=0.2, label='Confidence Interval_UCM')
plt.fill_between(forecast_index, forecast_ci.iloc[:, 0], forecast_ci.iloc[:, 1], color='orange', alpha=0.2, label='Confidence Interval_SARIMAX')
plt.title('Forecast of Lake Mead Water Elevation (1st Order Differenced Data)')
plt.xlabel('Year')
plt.ylabel('Elevation (feet above sea level)')
plt.legend()
plt.grid(True)
plt.savefig('Forecast_Lake_Mead_Water_Elevation_1st_Order_Differenced_Data_UCM_ARIMA.png')
plt.show()

Forecast of Lake Mead Water Elevation (1st Order Differenced Data) using UCM and ARIMA

We also invert the differencing to obtain the forecasted water levels in the original scale.

# Invert the forecasted values with the first order differencing
forecasted_diff_ucm = forecast_mean_ucm
forecast_ci_diff_ucm = forecast_ci_ucm
forecasted_values_ucm = np.r_[last_observed_value, forecasted_diff_ucm].cumsum().astype(float)[1:]

# Separate cumulative sum for lower and upper bounds
forecasted_ci_lower_ucm = np.r_[last_observed_value, forecast_ci_diff_ucm.iloc[:, 0]].cumsum().astype(float)[1:]
forecasted_ci_upper_ucm = np.r_[last_observed_value, forecast_ci_diff_ucm.iloc[:, 1]].cumsum().astype(float)[1:]

# Create a forecasted dataframe with separate CI columns for the next 30 years
forecasted_df_ucm = pd.DataFrame({
    'Year': forecast_index,
    'Elevation_UCM': forecasted_values_ucm,
    'Elevation_CI_Lower_UCM': forecasted_ci_lower_ucm,
    'Elevation_CI_Upper_UCM': forecasted_ci_upper_ucm
})

forecasted_df_ucm['Year'] = pd.to_datetime(forecasted_df_ucm['Year'], format='%Y')

# Plot the forecast for the original data
plt.figure()
plt.plot(annual_recent_drought['Year'], annual_recent_drought['Elevation'], label='Observed', color='blue')
plt.plot(forecasted_df['Year'], forecasted_df['Elevation'], label='Forecast_SARIMAX', color='purple')
plt.plot(forecasted_df_ucm['Year'], forecasted_df_ucm['Elevation_UCM'], label='Forecast_UCM', color='red')
plt.fill_between(forecasted_df['Year'], forecasted_df['Elevation_CI_Lower'], forecasted_df['Elevation_CI_Upper'], color='green', alpha=0.2, label='Confidence Interval_SARIMAX')
plt.fill_between(forecasted_df_ucm['Year'], forecasted_df_ucm['Elevation_CI_Lower_UCM'], forecasted_df_ucm['Elevation_CI_Upper_UCM'], color='orange', alpha=0.2, label='Confidence Interval_UCM')
plt.title('Forecast of Lake Mead Water Elevation')
plt.xlabel('Year')
plt.ylabel('Elevation (feet above sea level)')
plt.legend()
plt.grid(True)
plt.savefig('Forecast_Lake_Mead_Water_Elevation_UCM_ARIMA.png')
plt.show()

Forecast of Lake Mead Water Elevation using UCM and ARIMA

For better comparison, we set the index to the ‘Year’ column and merge the forecasted data from both models.

# Set the index to 'Year'
forecasted_df.set_index('Year', inplace=True)
forecasted_df_ucm.set_index('Year', inplace=True)

Step 5: Forecast with Only Recent Data

We then forecast the water levels for the next 30 years using only the recent data (from 2010 onwards) to compare the results with the full dataset.

# Just using the recent 10 years data
recent_10_years = annual_recent_drought.loc['2011':].copy()

# Auto ARIMA
model_10 = auto_arima(recent_10_years['Elevation_diff'], D=0, seasonal=True, m=12, stepwise=False, trace=True)
model_10.summary()
best_params_10 = model_10.get_params()
print(best_params_10)

The best parameters for the SARIMAX model using only the recent data will be displayed.

Best model:  ARIMA(0,0,0)(0,0,0)[12] intercept
Total fit time: 14.006 seconds
{'maxiter': 50, 'method': 'lbfgs', 'order': (0, 0, 0), 'out_of_sample_size': 0, 'scoring': 'mse', 'scoring_args': {}, 'seasonal_order': (0, 0, 0, 12), 'start_params': None, 'suppress_warnings': True, 'trend': None, 'with_intercept': True}

which is a strange result since the model is not able to find the optimal parameters for the SARIMAX model using only the recent data.

We then also fit both the SARIMAX and UCM models using only the recent data.

# SARIMAX
best_model_10 = SARIMAX(recent_10_years['Elevation_diff'], order=best_params_10['order'], seasonal_order=best_params_10['seasonal_order'])
best_fit_10 = best_model_10.fit()
best_fit_10.summary() 

# Forecast the next 30 years
forecast_10 = best_fit_10.get_forecast(steps=forecast_steps)
forecast_mean_10 = forecast_10.predicted_mean
forecast_ci_10 = forecast_10.conf_int()

# Using Structural Time Series Model
model_ucm_10 = UnobservedComponents(recent_10_years['Elevation_diff'], 'local linear trend', seasonal=12)
fit_ucm_10 = model_ucm_10.fit()
fit_ucm_10.summary()

# Forecast the next 30 years
forecast_ucm_10 = fit_ucm_10.get_forecast(steps=forecast_steps)
forecast_mean_ucm_10 = forecast_ucm_10.predicted_mean
forecast_ci_ucm_10 = forecast_ucm_10.conf_int()

# Plot the forecast for the 1st order differenced data
plt.figure()
plt.plot(recent_10_years['Year'], recent_10_years['Elevation_diff'], label='Observed', color='blue')
plt.plot(forecast_index, forecast_mean_10, label='Forecast_10', color='red')
plt.plot(forecast_index, forecast_mean, label='Forecast_SARIMAX', color='purple')
plt.plot(forecast_index, forecast_mean_ucm, label='Forecast_UCM', color='green')
plt.plot(forecast_index, forecast_mean_ucm_10, label='Forecast_UCM_10', color='orange')
plt.title('Forecast of Lake Mead Water Elevation (1st Order Differenced Data)')
plt.xlabel('Year')
plt.ylabel('Elevation (feet above sea level)')
plt.legend()
plt.grid(True)
plt.savefig('Forecast_Lake_Mead_Water_Elevation_1st_Order_Differenced_Data_UCM_ARIMA_10.png')
plt.show()

Forecast of Lake Mead Water Elevation (1st Order Differenced Data) using UCM and ARIMA with only recent data

We then invert the differencing to obtain the forecasted water levels in the original scale.

# Invert the forecasted values with the first order differencing
forecasted_diff_10 = forecast_mean_10
forecast_ci_diff_10 = forecast_ci_10
forecasted_values_10 = np.r_[last_observed_value, forecasted_diff_10].cumsum().astype(float)[1:]

# Separate cumulative sum for lower and upper bounds
forecasted_ci_lower_10 = np.r_[last_observed_value, forecast_ci_diff_10.iloc[:, 0]].cumsum().astype(float)[1:]
forecasted_ci_upper_10 = np.r_[last_observed_value, forecast_ci_diff_10.iloc[:, 1]].cumsum().astype(float)[1:]

# Create a forecasted dataframe with separate CI columns for the next 30 years
forecasted_df_10 = pd.DataFrame({
    'Year': forecast_index,
    'Elevation_10': forecasted_values_10,
    'Elevation_CI_Lower_10': forecasted_ci_lower_10,
    'Elevation_CI_Upper_10': forecasted_ci_upper_10
})

forecasted_df_10['Year'] = pd.to_datetime(forecasted_df_10['Year'], format='%Y')
forecasted_df_10.set_index('Year', inplace=True)

# Invert the forecasted values with the first order differencing
forecasted_diff_ucm_10 = forecast_mean_ucm_10
forecast_ci_diff_ucm_10 = forecast_ci_ucm_10
forecasted_values_ucm_10 = np.r_[last_observed_value, forecasted_diff_ucm_10].cumsum().astype(float)[1:]

# Separate cumulative sum for lower and upper bounds
forecasted_ci_lower_ucm_10 = np.r_[last_observed_value, forecast_ci_diff_ucm_10.iloc[:, 0]].cumsum().astype(float)[1:]
forecasted_ci_upper_ucm_10 = np.r_[last_observed_value, forecast_ci_diff_ucm_10.iloc[:, 1]].cumsum().astype(float)[1:]

# Create a forecasted dataframe with separate CI columns for the next 30 years
forecasted_df_ucm_10 = pd.DataFrame({
    'Year': forecast_index,
    'Elevation_UCM_10': forecasted_values_ucm_10,
    'Elevation_CI_Lower_UCM_10': forecasted_ci_lower_ucm_10,
    'Elevation_CI_Upper_UCM_10': forecasted_ci_upper_ucm_10
})

forecasted_df_ucm_10['Year'] = pd.to_datetime(forecasted_df_ucm_10['Year'], format='%Y')
forecasted_df_ucm_10.set_index('Year', inplace=True)

# Plot the forecast for the original data
plt.figure()
plt.plot(recent_10_years['Year'], recent_10_years['Elevation'], label='Observed', color='blue')
plt.plot(forecast_index, forecasted_df['Elevation'], label='Forecast_SARIMAX', color='purple')
plt.plot(forecast_index, forecasted_df_ucm['Elevation_UCM'], label='Forecast_UCM', color='green')
plt.plot(forecast_index, forecasted_df_10['Elevation_10'], label='Forecast_SARIMAX_10', color='red')
plt.plot(forecast_index, forecasted_df_ucm_10['Elevation_UCM_10'], label='Forecast_UCM_10', color='orange')
plt.title('Forecast of Lake Mead Water Elevation')
plt.xlabel('Year')
plt.ylabel('Elevation (feet above sea level)')
plt.legend()
plt.grid(True)
plt.savefig('Forecast_Lake_Mead_Water_Elevation_UCM_ARIMA_10.png')
plt.show()

Forecast of Lake Mead Water Elevation using UCM and ARIMA with only recent data

Step 6: Results (so far)

We print the predicted water levels and confidence intervals for the next 30 years using both the SARIMAX and UCM models with the full dataset and only the recent data.

# Printing the forecasted values for 2025, 2030, and 2050
## SARIMAX
print(f"Forecasted Elevation for 2025 (SARIMAX): {float(forecasted_df.loc['2025', 'Elevation']):.2f} feet")
print(f"Forecasted Elevation for 2030 (SARIMAX): {float(forecasted_df.loc['2030', 'Elevation']):.2f} feet")
print(f"Forecasted Elevation for 2050 (SARIMAX): {float(forecasted_df.loc['2050', 'Elevation']):.2f} feet")

## UCM
print(f"Forecasted Elevation for 2025 (UCM): {float(forecasted_df_ucm.loc['2025', 'Elevation_UCM']):.2f} feet")
print(f"Forecasted Elevation for 2030 (UCM): {float(forecasted_df_ucm.loc['2030', 'Elevation_UCM']):.2f} feet")
print(f"Forecasted Elevation for 2050 (UCM): {float(forecasted_df_ucm.loc['2050', 'Elevation_UCM']):.2f} feet")

## SARIMAX_10
print(f"Forecasted Elevation for 2025 (SARIMAX_10): {float(forecasted_df_10.loc['2025', 'Elevation_10']):.2f} feet")
print(f"Forecasted Elevation for 2030 (SARIMAX_10): {float(forecasted_df_10.loc['2030', 'Elevation_10']):.2f} feet")
print(f"Forecasted Elevation for 2050 (SARIMAX_10): {float(forecasted_df_10.loc['2050', 'Elevation_10']):.2f} feet")

## UCM_10
print(f"Forecasted Elevation for 2025 (UCM_10): {float(forecasted_df_ucm_10.loc['2025', 'Elevation_UCM_10']):.2f} feet")
print(f"Forecasted Elevation for 2030 (UCM_10): {float(forecasted_df_ucm_10.loc['2030', 'Elevation_UCM_10']):.2f} feet")
print(f"Forecasted Elevation for 2050 (UCM_10): {float(forecasted_df_ucm_10.loc['2050', 'Elevation_UCM_10']):.2f} feet")

The predicted water levels for 2025, 2030, and 2050 using both the SARIMAX and UCM models with the full dataset and only the recent data will be displayed.

Forecasted Elevation for 2025 (SARIMAX): 1068.71 feet
Forecasted Elevation for 2030 (SARIMAX): 1066.61 feet
Forecasted Elevation for 2050 (SARIMAX): 1062.70 feet
Forecasted Elevation for 2025 (UCM): 1070.91 feet
Forecasted Elevation for 2030 (UCM): 1071.24 feet
Forecasted Elevation for 2050 (UCM): 1088.30 feet
Forecasted Elevation for 2025 (SARIMAX_10): 1067.68 feet
Forecasted Elevation for 2030 (SARIMAX_10): 1067.68 feet
Forecasted Elevation for 2050 (SARIMAX_10): 1067.68 feet
Forecasted Elevation for 2025 (UCM_10): 1062.64 feet
Forecasted Elevation for 2030 (UCM_10): 1050.98 feet
Forecasted Elevation for 2050 (UCM_10): 969.96 feet

Step 7: Further Model (Ensemble Model)

We then create an ensemble model by creating a dynamic weighted average based on their error metrics.

# Ensemble Forecasting
## Dynamic Weighted Average
def dynamic_weighted_average(forecasts, weights=None):
    error1 = np.abs(forecasts[0] - forecasts[1])
    error2 = np.abs(forecasts[1] - forecasts[2])
    error3 = np.abs(forecasts[0] - forecasts[2])
    total_error = error1 + error2 + error3
    weight1 = error2 / total_error
    weight2 = error1 / total_error
    weight3 = error3 / total_error
    return forecasts[0] * weight1 + forecasts[1] * weight2 + forecasts[2] * weight3

# Get the forecasted values for 2025, 2030, and 2050
forecast_2025 = [forecasted_df.loc['2025', 'Elevation'], forecasted_df_ucm.loc['2025', 'Elevation_UCM'], forecasted_df_10.loc['2025', 'Elevation_10']]
forecast_2030 = [forecasted_df.loc['2030', 'Elevation'], forecasted_df_ucm.loc['2030', 'Elevation_UCM'], forecasted_df_10.loc['2030', 'Elevation_10']]
forecast_2050 = [forecasted_df.loc['2050', 'Elevation'], forecasted_df_ucm.loc['2050', 'Elevation_UCM'], forecasted_df_10.loc['2050', 'Elevation_10']]
ensemble_2025 = dynamic_weighted_average(forecast_2025)
ensemble_2030 = dynamic_weighted_average(forecast_2030)
ensemble_2050 = dynamic_weighted_average(forecast_2050)

# Print the ensemble forecasted values
print(f"Ensemble Forecasted Elevation for 2025: {float(ensemble_2025):.2f} feet")
print(f"Ensemble Forecasted Elevation for 2030: {float(ensemble_2030):.2f} feet")
print(f"Ensemble Forecasted Elevation for 2050: {float(ensemble_2050):.2f} feet")

The ensemble forecasted water levels for 2025, 2030, and 2050 will be displayed.

Ensemble Forecasted Elevation for 2025: 1069.29 feet
Ensemble Forecasted Elevation for 2030: 1069.05 feet
Ensemble Forecasted Elevation for 2050: 1075.98 feet

Discussion

Summary of Key Findings:
- Our analysis of Lake Mead’s water levels using SARIMAX, Unobserved Components Model (UCM), and ensemble forecasting methods has revealed several critical insights:
- The SARIMAX model, using data from 2000 onwards, predicts a continued decline in water levels, with elevations of $1068.71$ feet, $1066.61$ feet, and $1062.70$ feet for 2025, 2030, and 2050 respectively.
- The UCM, utilizing the same data range, suggests a more optimistic scenario with slight increases, predicting $1070.91$ feet, $1071.24$ feet, and $1088.30$ feet for the same years.
- Models using only recent data (2010 onwards) show divergent predictions, with the SARIMAX model forecasting stable levels around 1067.68 feet, while the UCM predicts a severe decline to $969.96$ feet by 2050. Our ensemble model, which dynamically weights these predictions, forecasts elevations of $1069.29$ feet, $1069.05$ feet, and $1075.98$ feet for 2025, 2030, and 2050 respectively.
Interpretation of Results:
- The divergence between models highlights the uncertainty in long-term water level predictions for Lake Mead.
- The SARIMAX model’s prediction of continued decline aligns with the observed trend since 2000, suggesting persistent drought conditions.
- The UCM’s more optimistic forecast may be capturing potential cyclical patterns or assuming some recovery from current drought conditions.
- The stark difference in predictions when using only recent data underscores the sensitivity of these models to the input data range and the severity of the recent drought period.
Comparison with Existing Knowledge:
- Our findings align with the Bureau of Reclamation’s concerns about declining water levels, which led to the first-ever water shortage declaration in 2021.
- The predicted levels, even in the most optimistic scenarios, remain well below the historical highs of over $1200$ feet, indicating a long-term shift in Lake Mead’s water availability.
Implications:
- Water Supply: Even our most optimistic predictions suggest that Lake Mead will continue to operate at reduced capacity, necessitating continued water conservation measures and potentially more severe water use restrictions.
- Agriculture: The agricultural sector, being the first to face cuts during shortages, may need to adapt to permanently reduced water allocations, potentially leading to shifts in crop types or irrigation methods.
- Hydroelectric Power: Continued low water levels could impact the efficiency and output of the Hoover Dam’s hydroelectric generators, affecting power supply to the region.
- Ecosystem: Persistent low water levels could have long-term effects on the local ecosystem, potentially altering habitats for various species.
Limitations:
- Our models are based on historical data and may not fully account for future climate change impacts or potential policy interventions.
- The models don’t incorporate potential changes in water management practices or technological advancements that could alter future water levels.
- External factors such as population growth and changes in water consumption patterns are not directly factored into these time series models.
Future Research:
- Incorporate climate change models to refine long-term predictions.
- Develop more sophisticated models that can account for policy interventions and changes in water management practices.
- Conduct a detailed analysis of the potential impact of large-scale wastewater recycling on Lake Mead’s water levels.

Conclusion

The Lake Mead drought crisis presents a complex and urgent challenge for water management in the southwestern United States. Our comprehensive analysis, utilizing multiple forecasting models, paints a picture of continued stress on this crucial water resource:

All models predict that Lake Mead’s water levels will remain significantly below historical averages for the foreseeable future, with elevations likely to hover around $1070$ feet above sea level by 2030.
The divergence in long-term predictions (2050) highlights the uncertainty in forecasting water levels over extended periods and underscores the need for adaptive management strategies.
The severity of the current drought, as captured by models using only recent data, suggests that without significant intervention, Lake Mead could face critical water shortages in the coming decades.
The ensemble model, which combines insights from multiple forecasting approaches, provides a balanced prediction that can serve as a robust basis for planning purposes.
Given these projections, it is clear that current water usage patterns are unsustainable, necessitating a multi-faceted approach to water conservation and management.
Wastewater recycling emerges as a critical potential solution to augment water supplies. Our analysis suggests that implementing large-scale wastewater recycling could help offset future shortfalls, though it is unlikely to completely solve the water shortage issue on its own.
Future water management strategies for Lake Mead should focus on:
- Implementing stringent water conservation measures across all sectors
- Investing in and scaling up wastewater recycling technologies
- Developing more efficient water use practices, particularly in agriculture
- Continuing to monitor and refine predictive models to inform adaptive management policies
- Exploring additional innovative solutions such as desalination or water importation schemes
The Lake Mead crisis serves as a stark reminder of the impacts of climate change on water resources and the need for proactive, science-based water management policies.

In conclusion, while the future of Lake Mead presents significant challenges, it also offers an opportunity to revolutionize our approach to water management. By combining innovative technologies like wastewater recycling with conservation efforts and adaptive policies, we can work towards ensuring water security for the millions who depend on this vital resource. The path forward will require collaboration between policymakers, scientists, and communities, guided by continual monitoring and modeling of Lake Mead’s water levels.

Share on

Twitter Facebook LinkedIn

Sok Kin Cheng