It is week 16 of my training at The Data School, the time has flown by and I am now a couple weeks away from finishing training! To utilise and expand on the many skills I have learnt in the past few months, I have started to work on a personal project.
For this personal project, I am exploring and analysing bathing water quality in England as I am an avid open water swimmer! Inspired by Surfers Against Sewage Safer Seas and Rivers app I plan to create a dashboard that will show data quality reports all over the country, investigating historic trends as well as being able to compare locations and appointed sewerage undertakers. To do this, I am planning to extract data from an API, clean it in Alteryx, and visualise it in a dashboard in Tableau.
Users
This project is aimed to help open water swimmers identify whether open water locations are safe to swim at (from a water quality perspective). It would allow them to analyse why these locations are safe or not at a deeper level and be able to see historic trends as well as the responsible parties. This will help users gain a better understanding of why their swim spot is safe to swim at or not and help them to stay safe.
The Data
The data I am using is the government bathing water quality dataset. This has the option to download historic data but also has an API that I can use. Using the historic data is a lot easier to download, however, this would be static and so would quickly be temporally irrelevant. Though you would still be able to analyse historic activity. If I were to use the API, this would allow for regular updates that would keep the dashboard temporally relevent. Though there is a lot of documentation to understand and multiple models with multiple APIs depending on what you want to know.
The Plan
To start off with, I have downloaded the historic static data to understand what sort of fields are available for me to use. From this, I am choosing which fields are useful for this project. At the moment, I have identified these to be:
- The site (bathing water), it's location and it's appointed sewarage undertaker
- The sample date time and the levels of e-coli found
- The annual classification of different sites (whether it is excellent, good, sufficent, or poor)
Once I have explored the data thoroughly, I will then sketch what graphs I want to include to provide a deeper analysis and think about how these will work together on a dashboard. When I have identified all the fields that I think are needed for this project, I will then investigate how to retrieve these using the API as this would ensure my dashboard can be kept up to date (as I mentioned above). I will then build the charts in dashboard and put it together onto my planned dashboard.