Compare data from the Disdrometer and Rain Gauge¶

import xarray as xr
import hvplot.xarray
import hvplot.pandas
import pandas as pd
import holoviews as hv
from distributed import Client, LocalCluster
hv.extension('bokeh')

Spin up a Dask Cluster¶

cluster = LocalCluster()
client = Client(cluster)
client

distributed.diskutils - INFO - Found stale lock file and directory '/Users/mgrover/git_repos/SAILS-radar-analysis/notebooks/precipitation-exploration/dask-worker-space/worker-ir4zk1dc', purging
distributed.diskutils - INFO - Found stale lock file and directory '/Users/mgrover/git_repos/SAILS-radar-analysis/notebooks/precipitation-exploration/dask-worker-space/worker-xti6v0kt', purging
distributed.diskutils - INFO - Found stale lock file and directory '/Users/mgrover/git_repos/SAILS-radar-analysis/notebooks/precipitation-exploration/dask-worker-space/worker-z1_b971u', purging
distributed.diskutils - INFO - Found stale lock file and directory '/Users/mgrover/git_repos/SAILS-radar-analysis/notebooks/precipitation-exploration/dask-worker-space/worker-9gganotb', purging

Client

Client-c0f4f414-b03e-11ec-975c-acde48001122

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

LocalCluster

91e9e044

Dashboard: http://127.0.0.1:8787/status	Workers: 4
Total threads: 12	Total memory: 16.00 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-f9e46403-61c0-4ba0-b98b-8c522987e89c

Comm: tcp://127.0.0.1:51993	Workers: 4
Dashboard: http://127.0.0.1:8787/status	Total threads: 12
Started: Just now	Total memory: 16.00 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:52007	Total threads: 3
Dashboard: http://127.0.0.1:52008/status	Memory: 4.00 GiB
Nanny: tcp://127.0.0.1:51997
Local directory: /Users/mgrover/git_repos/SAILS-radar-analysis/notebooks/precipitation-exploration/dask-worker-space/worker-0t99hgjl

Worker: 1

Comm: tcp://127.0.0.1:52010	Total threads: 3
Dashboard: http://127.0.0.1:52011/status	Memory: 4.00 GiB
Nanny: tcp://127.0.0.1:51998
Local directory: /Users/mgrover/git_repos/SAILS-radar-analysis/notebooks/precipitation-exploration/dask-worker-space/worker-tucc5a43

Worker: 2

Comm: tcp://127.0.0.1:52013	Total threads: 3
Dashboard: http://127.0.0.1:52014/status	Memory: 4.00 GiB
Nanny: tcp://127.0.0.1:51999
Local directory: /Users/mgrover/git_repos/SAILS-radar-analysis/notebooks/precipitation-exploration/dask-worker-space/worker-_kxqfpi5

Worker: 3

Comm: tcp://127.0.0.1:52004	Total threads: 3
Dashboard: http://127.0.0.1:52005/status	Memory: 4.00 GiB
Nanny: tcp://127.0.0.1:51996
Local directory: /Users/mgrover/git_repos/SAILS-radar-analysis/notebooks/precipitation-exploration/dask-worker-space/worker-z8evlhqf

Read in the data¶

Here, we use data from:

a disdrometer (datastream: gucldM1.b1)
a rain gauge (datastream: gucwbpluvio2M1.a1)

We use all the data available, you can query from this website: https://adc.arm.gov/discovery/#/

disdrometer_ds = xr.open_mfdataset('../../data/disdrometer/*')
gauge_ds = xr.open_mfdataset('../../data/rain-gauge/*')

Visualize the Data¶

We use hvplot here to plot an interactive view of our data, utilizing rasterize to dynamically view our large datasets

from bokeh.models import DatetimeTickFormatter

def apply_formatter(plot, element):
    plot.handles['xaxis'].formatter = DatetimeTickFormatter(hours='%m/%d/%Y \n %H:%M',
                                                            minutes='%m/%d/%Y \n %H:%M',
                                                            hourmin='%m/%d/%Y \n %H:%M',
                                                            days='%m/%d/%Y \n %H:%M',
                                                            months='%m/%d/%Y \n %H:%M')

disdrometer_precip_rate = disdrometer_ds.where(disdrometer_ds.qc_precip_rate == 0).precip_rate.compute()
gauge_precip_rate = gauge_ds.where(gauge_ds.pluvio_status == 0).intensity_rtnrt.compute()

gauge_precip_rate.attrs['long_name'] = disdrometer_precip_rate.attrs['long_name']
gauge_precip_rate['name'] = disdrometer_precip_rate.name

max_precip = 50
disdrometer_rate_plot = disdrometer_precip_rate.hvplot.line(ylim=(0, max_precip),
                                                            label='Disdrometer',
                                                            rasterize=True,
                                                            xlabel='Date/Time',
                                                            title='Disdrometer (gucldM1.b1)').opts(hooks=[apply_formatter])
bucket_rate_plot = gauge_precip_rate.hvplot.line(ylim=(0, max_precip),
                                                 rasterize=True,
                                                 label='Rain Gauge',
                                                 xlabel='Date/Time',
                                                 title='Rain Gauge (gucwbpluvio2M1.a1)').opts(hooks=[apply_formatter])

View our Final Visualization¶

We can combine these into a single column of plots, matching the views/colorbars

(disdrometer_rate_plot + bucket_rate_plot).cols(1)

Look for Periods of Precip¶

We use this example of looking for periods above a threshold in pandas - https://stackoverflow.com/questions/55014867/pandas-threshold-data-sequence-based-on-the-length-of-pattern

The main steps here are:

convert our data to a dataframe
mask of records above baseline (precipitation rate > 0)
label groups of sequenced positive values
check the size of each sequence group
filter values that are positive and the sequence size is lower than n time (minutes)
- determine an “event threshold” value (in minutes)
determine an event as one of these occurences

Convert to a `pandas.DataFrame`¶

Convert our Rain Gauge Data to a pandas.DataFrame For this exercise, we will use a pandas.DataFrame instead of an xarray.Dataset

df = pd.DataFrame({'gauge_precip':gauge_precip_rate.values},
                  index=gauge_precip_rate.time.values)

Filter for Positive Precipitation Values¶

masked_gauge = df > 0
masked_gauge

	gauge_precip
2021-04-23 17:55:00	False
2021-04-23 17:56:00	False
2021-04-23 17:57:00	False
2021-04-23 17:58:00	False
2021-04-23 17:59:00	False
...	...
2022-03-29 20:55:00	False
2022-03-29 20:56:00	False
2022-03-29 20:57:00	False
2022-03-29 20:58:00	False
2022-03-29 20:59:00	False

428505 rows × 1 columns

Separate our data based on consecutive, non-missing values¶

sequence_groups = abs(masked_gauge.astype(int).diff(1).fillna(0)).cumsum().squeeze()
sequence_groups

2021-04-23 17:55:00       0.0
2021-04-23 17:56:00       0.0
2021-04-23 17:57:00       0.0
2021-04-23 17:58:00       0.0
2021-04-23 17:59:00       0.0
                        ...  
2022-03-29 20:55:00    9978.0
2022-03-29 20:56:00    9978.0
2022-03-29 20:57:00    9978.0
2022-03-29 20:58:00    9978.0
2022-03-29 20:59:00    9978.0
Name: gauge_precip, Length: 428505, dtype: float64

Use Groupby to bin our groups, and transform back to the dataframe¶

sequence_size = masked_gauge.groupby(sequence_groups).transform(len)
sequence_size

	gauge_precip
2021-04-23 17:55:00	20032
2021-04-23 17:56:00	20032
2021-04-23 17:57:00	20032
2021-04-23 17:58:00	20032
2021-04-23 17:59:00	20032
...	...
2022-03-29 20:55:00	261
2022-03-29 20:56:00	261
2022-03-29 20:57:00	261
2022-03-29 20:58:00	261
2022-03-29 20:59:00	261

428505 rows × 1 columns

Merge our filters into a single dataframe¶

Currently, these groups/checks are in separate dataframes - let’s bring these together!

df_extended = pd.concat([df, masked_gauge, sequence_groups, sequence_size], axis=1)
df_extended.columns = ['value', 'is_positive', 'sequence_group', 'sequence_size']
df_extended

	value	is_positive	sequence_group	sequence_size
2021-04-23 17:55:00	0.0	False	0.0	20032
2021-04-23 17:56:00	0.0	False	0.0	20032
2021-04-23 17:57:00	0.0	False	0.0	20032
2021-04-23 17:58:00	0.0	False	0.0	20032
2021-04-23 17:59:00	0.0	False	0.0	20032
...	...	...	...	...
2022-03-29 20:55:00	0.0	False	9978.0	261
2022-03-29 20:56:00	0.0	False	9978.0	261
2022-03-29 20:57:00	0.0	False	9978.0	261
2022-03-29 20:58:00	0.0	False	9978.0	261
2022-03-29 20:59:00	0.0	False	9978.0	261

428505 rows × 4 columns

Determine our Event Duration Thresholds¶

As mentioned before, we are interested in finding consecutive periods of precipitation. The question is - how long of a period are we looking for? 5 minutes? 30 minutes? Let’s try out some thresholds, ranging from 5 to 30 minutes, investigating how increasing the threshold by one minute impacts how many groups we end up with.

group_sizes = np.arange(5, 31, 1)

df_list = []
for group_size in group_sizes:
    good_groups = (df_extended.sequence_size >= group_size) & (df_extended.is_positive)
    
    good_groups_df = pd.DataFrame({'time_threshold':str(group_size),
                                   'group_size':len(df_extended[good_groups].sequence_group.unique())},
                                  index=[0])
    df_list.append(good_groups_df)

group_size_df = pd.concat(df_list, ignore_index=True)

Visualize our Threshold Value Options¶

It looks like there is a steep drop off (a lot fewer events) after 11 minutes - 11 minutes looks like the sweet spot before we filter out too many events!

group_size_df.plot.bar(x='time_threshold',
                       y='group_size',
                       xlabel='# of Consecutive Minutes with Precip',
                       ylabel='Count',
                       title=f'Number of Precip Periods Between \n {str(masked_gauge.index.min())} and {str(masked_gauge.index.max())}',
                       legend=False);

../_images/disdrometer-gauge-comparison_28_0.png

Apply our filter to the dataset¶

Now that we have our threshold, let’s apply this to our dataset. We filter for:

Events greater than 11 minutes in duration
Events with positive precipitation

precip_events = df_extended[(df_extended.sequence_size >= 11) & (df_extended.value >= 0)].sequence_group.unique()
precip_events.astype(int)

array([   0,    2,    8, ..., 9962, 9966, 9978])

SAIL Radar Analysis

Compare data from the Disdrometer and Rain Gauge

Contents

Compare data from the Disdrometer and Rain Gauge¶

Spin up a Dask Cluster¶

Client

Cluster Info

LocalCluster

Scheduler Info

Scheduler

Workers

Worker: 0

Worker: 1

Worker: 2

Worker: 3

Read in the data¶

Visualize the Data¶

View our Final Visualization¶

Look for Periods of Precip¶

Convert to a `pandas.DataFrame`¶

Filter for Positive Precipitation Values¶

Separate our data based on consecutive, non-missing values¶

Use Groupby to bin our groups, and transform back to the dataframe¶

Merge our filters into a single dataframe¶

Determine our Event Duration Thresholds¶

Visualize our Threshold Value Options¶

Apply our filter to the dataset¶

Now that we have our “precip_events”, let’s go back to an xarray.Dataset¶

Loop through the different groups and create datasets¶

Compare Disdrometer and Rain Gauge Precipitation Values for Each Event¶

SAIL Radar Analysis

Compare data from the Disdrometer and Rain Gauge

Contents

Compare data from the Disdrometer and Rain Gauge¶

Spin up a Dask Cluster¶

Client

Cluster Info

LocalCluster

Scheduler Info

Scheduler

Workers

Worker: 0

Worker: 1

Worker: 2

Worker: 3

Read in the data¶

Visualize the Data¶

View our Final Visualization¶

Look for Periods of Precip¶

Convert to a pandas.DataFrame¶

Filter for Positive Precipitation Values¶

Separate our data based on consecutive, non-missing values¶

Use Groupby to bin our groups, and transform back to the dataframe¶

Merge our filters into a single dataframe¶

Determine our Event Duration Thresholds¶

Visualize our Threshold Value Options¶

Apply our filter to the dataset¶

Now that we have our “precip_events”, let’s go back to an xarray.Dataset¶

Loop through the different groups and create datasets¶

Compare Disdrometer and Rain Gauge Precipitation Values for Each Event¶

Convert to a `pandas.DataFrame`¶