Not only did my colleagues and I survive the notorious Dashboard Week, but the last day was – dare I say – quite fun! We were given a list of Hudson River Valley Heritage Sites and asked to find the latitude and longitude for these points in order to finally create some sort of analysis about where these sites are located. As a Dutch person who has visited many of these sites to explore their Dutch history, I was excited to jump right in.
Simple web scraping involves identifying a URL structure and feeding this into a tool for the data to be pulled from a specific site. I already ran into blockers here, as I noticed that certain sites arbitrarily have a hyphen after the site name, like in the images below:


Given today's tight deadline, I decided to focus on only retrieving data from links I was able to correctly feed into Alteryx, which I was able to do for a handful with the Find and Replace tool.

As shown in the screenshot above, I proceeded by pulling the data with the Download and Text to Columns tool. Alteryx was pull almost all data about the sites except what I really needed: their addresses! When I tried to search for the source code that contained a site address in Alteryx, I couldn't find anything. For example, the James Vanderpoel House address was nowhere to be found in Alteryx. When I went to inspect the webpage, however, I was able to find the source code containing address information. Importantly, I realized the address was likely "hidden" in a place that Alteryx couldn't see. In the screenshot below, we see the inspect page is identifying where the address lives – in this <li> umbrella code.

The carat on the left opens up more lines of code where the address is embedded.

After this discovery, I realized it would be best for me to pivot to using other online tools to find a site's longitude and latitude. Enter Geocodeio, today's lifesaver! I was able to easily upload a spreadsheet with my site information, which Geocodeio then appended spatial information to.
With the spatial data I was finally able to go into Tableau and build the map I envisioned making. I calculated points for both the Heritage Sites and The Information Lab's office, and used the makeline() and length() functions to calculate the distance from Heritage Sites to our office.
