Week 6 August 2nd- August 6th

This week with our second mentor Eugene, we reviewed our last two experiments. On a shared Google docuement we made a summary of each experiment, and what we learned from them. **Trade-off between site # and visit # **__ **Experiment Design - **

Assuming closure holds, what trade offs should we consider # of sites and # of sites/visit
We used one generated ground truth dataset to repeatedly change the # of visits and sites by partitioning the data.
We start with a site of size M and 16 visits, we then double the number of sites (2M) and half the number of visits, to ensure the number of ‘datapoints’ stays the same even while we are changing the grouping of the ‘datapoints’.
Simulation settings: 200 sites / 2 visits 100 sites / 4 visits 50 sites / 8 visits 25 sites / 16 visits
We will run with 4 different simulation settings, with 10 replications of this experiment, and with 4 different sites M.

**Results (plots) - **

Observations:

For many runs we found the 25 site / 16 visit results to be an outlier for the occupancy plot.
We found that between 50 sites/8 visits and 100 sites/ 4 visits gave us the best results for the detection plot.
We observed that the occupancy plot looks similar to a downward curve for most runs.

Learning points -

We believe that having 25 sites/16 visits is redundant and possibly does not have a good balance between # sites and # of visits, which can cause it to be an outlier as seen in the plots.
For our second observation, we think that these combination of #sites/#visits is a good balance of sites/visits, so that we do not have some combination of many sites and low # of visits, or low # of sites and high # of visits.

Questions we have: Why is the occupancy plot so high (for 25 sites) but in the detection plot our rmse value for 25 sites is the second lowest?

The effect of a wrong site clustering__ Experiment Design:

How bad will the estimates for occupancy and detection be if the data is grouped by merging sites? Is there an instance in which the merged data has a lower estimate?
We generated a ground truth dataset
For the wrong clustering, the number of sites was cut down to half and the number of visits was doubled. We did this by combining every two two sites into one. ground truth data wrong site clustering dataset
The values of each covariate in the wrong clustering dataset was the average of the covariates in the ground truth dataset

Results (plots):

**Observations - **

Merged data slightly affects more detection than occupancy in 200-2 and 100-4 scenarios
Detection is better with 200 sites and 2 visits than with 100 sites and 4 visits.
There is almost no change between 200sites-2visits and 100sites-4visits for occupancy
50 sites-8 visits looks like the best scenario with the lowest estimate for occupancy (?)
For 50-8 detection with the merged data is the most affected. The merged detection RMSE value is a very high outlier. **Learning points - **
Repeated visits may affect the surroundings of the species. Therefore having less visits the most likely the species is going to stay in that environment therefore is it more likely to detect the species.
There is a good balance between the # of sites and # of visits
Detection is affected more than occupancy utilizing merged data

Written on August 6, 2021