Week 1 June 28th- July 2nd
During this week, I was giving multiple papers related to the project I and my peer, Demetrius Hernandez, will be working on during the summer. The first paper we were giving is named : On the Role of Spatioal Clustering Algorithms in Building Species Distribution Models from Community Science Data by Mark Roth. The link to his presentation about this project can be found here: https://recorder-v3.slideslive.com/?share=42219&s=5bd64fd0-d8ce-4c16-b0a3-7c4a68d28a55 This paper gave us an insight of the kind of data we will be working on and the different challenges we may encounter during our research. We were first introduced to Ocupancy Modelling, which “allows to simulteously estimate the probabilty that a species occupies a location and the probability that the observer detects the species given that it is present.” This is what we will be the focus of our final project. In this paper, it is suggested to “defining a site as a set of two to ten checklists(list of species recorded during one period) submitted by the same observer at the same exact latitude-longitude coordinates” for a analysis of the data.
Our second paper to read is titled: Best practices for making reliable inferences from citizen science data: case study using eBird to estimate species distributions by A Johnston. Here it is explained in detail how citizen data is valuable for a wide range of ecological research questions, however challlenges are presented when used. It is explained how there are different covariates that can be presented at different sites at different hours of the days, which have a greta impact on the data recorded. While reading this paper one of my concerns about recording data was counting the same species twice, which can cause errors in the models, and in the long run provide inccorect results. A new term that caught my attention while reading this paper was Generalized Additive Model(GAM), to which I was told it is related to logistic regression. https://searchbusinessanalytics.techtarget.com/definition/logistic-regression
One of the papers in which we spent more time than usual because of the new introduced math is title: Estimating Site Occupancy Rates When Detention Probabilities are Less than One by Darryl I. Mackenzie. This paper helped us understand how occupancy models work and how to get the best results, like “increasing the number of visists to improve the precision of the estimated occupancy rate.” After analyzing this paper with my peers, a concern came through my mind: using citizen data will mostly likely lead to having false positives/negatives throughout the data, however, reducing the data to be only recorded by expertises will not be sufficient to have a good estimate occupancy rate. So what could be a good solution to this problem.