Are there science and art deserts or do they mirror population alone?
There is wider distribution than I would have thought. The extent of distribution between science and art isn't that different
Given Houston is known for certain industries (oil, NASA, health) I thought to see more clustering around the areas most known for that.
With this plot, you mostly see geographic extent. What are density of events/jobs where the dots overlap? That is hard to see from this.
There would be a single art cluster & maybe 3 science (health, oil, space)
There is a cluster of highest density of values (and also events!) just south of downtown.
The clustesr I was expected aren't as dominate as I expected,but there is denser locations just north and west of downtown, which you don't see in art venues. Additionally, you see NASA, which has a lot of science but not much art, saddly.
This plots makes me wonder what companies employ scientists in multiple places and what venues have the most shows!
Can we see neighborhoods of our most common industries?
Yes! Though not as cleanly as maybe I thought might be possilbe?
Wonder how many companies do work in multiple industries. For example, Jacobs works for all three major industries but especially petroleum and space.
We should see clusters of jobs with totals in order of health, petroleum, other, and then space.
My predicted order is close to being correct with more in "other" column than expected. Some of the ads categorized as "petroleum" or "space" are either in both, large engineering companies for example, or hard to tell where they should go. For example, a job description that has 'physics' in it can be ambiguous.
This plots makes me wonder what companies employ scientists in multiple places and what venues have the most shows!
The museums and universities will have a large portion of the events?
The data shows smaller galleries are among the highest event per venue numbers. This is part due to their actual high numbers and short turn around of shows as well as the fact that after mid 2016 many of the largest museums and galleries were advertisted under a single page instead of multiple event pages! This can be seen better in other visualizations.
It makes me wonder how differences in recording data could affect raw numbers over the history of Glastire. There is evidence of a change in mid-2016 but not definitive evidence of its exact nature.
In late 2017, I got curious about the spatial distribution of where Science takes place in Houston. I couldn't find any maps or databases already created, so I attempted to figure out if I could generate a dataset from sources meant for a different purpose. I eventually settled on job advertisements. Although there isn't a flag on any site for "science jobs", I was able to find a list of terms associated with science jobs. I wrote a program to scrape job advertisements that a search returned when using words from that list of >100 science associated words. I got ~5000 results. However, the results were very very messy with a lot of false positives. Just one example, most "satellite" jobs are at "satellite locations, nothing to do with making satellites! Due to limited time and the difficulity cleaning through a large number of text descriptions, the project got put on the shelf (on github).
Inspired by the idea of spatially understanding Art in Houston, I decied to bring that dataset back out, clean it, and see if I could analyze both datasets the same way.
To start out, I labeled 1200 jobs as either "true" or "false" for actually being a science job. I think generated a Machine Learning model (random forest) to take my labeled data and use that to predict which ones of other job descriptions were real or not.
1. Be able to analyze both the science dataset and the art dataset in identical or similar ways. I wanted to let the combination of the two drive the style of my analysis.
2. Wanted the appearance to have a visual theme.
3. Wanted to have finished product be live and editable on the web.
4. Wanted to be in a place where the general public could further explore the data and ask their own questions.
You can find more information about the generation and cleaning of the Science job dataset I created on its original github repository
You can find more information about the art venue dataset provided by Glasstire on this site for the hackathon or on their main Glasstire website.
Glasstire, being the online publication for art in Texas for 18 years, has compiled a unique dataset of art venues thrhougout Houston. The Glasstire DataHack asked participants to search for insights and visualizations.