COVID-19 Daily Update: Automation Progress Update:
Today is a model discussion day. I will give a pandemic status update tomorrow.
I have been working recently on automating the process of gathering data every morning. Frankly, I'm surprised it took me this long.
Previously I had been moving data to my spreadsheet from various sources by hand. For those interested, here's a link to my spreadsheet, with all my old data.
As nice as it was to have all the data (and figures) in one place, where everyone could look at it, and intuitively see how I did any calculation... as I began to track (and correlate) a larger set of data, this became unwieldy, and simply took too much of my time.
I spent yesterday and this morning writing python code to automate this process.
For those interested, the code is available on github: https://github.com/jlc42/JLC-COVID-19-Tools
This is a very preliminary work in progress. Currently, it has scripts to gather data which I ca run each day. I still need to write some code to automatically parse some of the web pages I scrape. But at least the data is saved. (Some date, notably from Georgia and Texas goes away if you don't scrape it daily. Some of that data is saved each day by CovidTracking, but other bits are simply lost if I don't scrape it myself daily, namely the testing data on antigen tests or serology tests. Texas puts that in their spreadsheet, but only has the "daily" value for the antigen and serology tests there... so yesterday's numbers are 'wiped out' by tomorrow's numbers.)
I can now put that on a chron job, and have it run every night while I am sleeping, and the data will all be saved.
Next I need to write some code to go through all the saved data, and organize it into the useful fields I track... that's a larger task, that will take a LOT of time. For NOW, I have it pulling down (and organizing) the US state data from CovidDTracking... and then building a single plot...
So... after two days of work... drum roll please.... here's my single plot, automatically generated for me this morning:
And there it is, in all its glory. Here's the same figure plotted from my spreadsheet:
Side note: things ARE improving in New Mexico, yes Tests are down along with cases, but that is because the DEMAND for tests is falling, as true infections fall... you can tell because the % of the tests that are positive is also falling.
Back to automation...
This single figure may not look like much, HOWEVER, with a simple loop, tomorrow I will be ready to produce this figure for EVERY state in the US.
And with a little bit of additional work, some parsing and gathering of data from my other sources, combined into a single data-file, I will be able to do this for every country as well.
The case data and testing data is already available from other sources, but the next step will be to do some of the unique calculations I sometimes do, like my estimates of the % infected, and % currently infected, etc. And that's where the real potential lies. At that point, I may want to think about hosting this somewhere where people can see updated estimates for all these quantities for whatever location they are in.
No comments:
Post a Comment