In order to become a good at anything, there is one thing that you need to do and that is to practice, practice… and then practice some more. When that something is data analysis, however, you actually need data sets to try stuff out.
From my two previous posts, you maybe saw that I randomly generated data just to do Pivot tables and charts. The purpose there, however, was just to go through the process of creating the visualizations. Now when you want to go a little deeper into identifying trends and using statistics, what you actually need is sample data.
While I was upskilling myself, I ran into this roadblock repeatedly, so I thought, I dedicate some time to researching where one can find suitable data.
The cool thing here: The governments of this world and other organisations have dedicated departments for this purpose “and” their data are very often openly available! They may not always be in the absolutely correct format, but hey converting data is also part of the job.
I give below a range of websites that I found, where you can find very interesting datasets to play with. If you are not from the UK or the US, I am sure, similar institutions exist in your part of the world.
The city of New York decided to make their data public and sponsored several competitions to make apps for the city that use these data sets. Apparently, you can also find big data sets here.
The topics here range from business, over Education all the way to transportation. This website is plain awesome and you have a huge selection of datasets to work with!
The EU’s statistical office with its seat in Luxembourg gives you a whole range of datasets ranging from industrial over agricultural via educational all the way to scientific topics. Feel free to browse 🙂
If you need demographic data, this is a nice website to find a plethora of different information. Just go to their Data & Statistics Section for more information.
The self-declared goal of this site is to make the data of the world bank easily accessible and free of charge.
Here, you have yet again a huuuge amount of information at your disposal. The thing that took my fancy were the gender statistics that are available as an xls file.
The government of the United Kingdom put all their departments under one domain name: https://www.gov.uk . And believe it or not, they put a huge amount of statistical data sets up there, as well.
Care for an example? How about the historical data on consumption of gas in the UK since 1920 up until 2016?
The Freie Universität Berlin gives open access to datasets in their own wiki in order for the student to do some own learning. In their own words, “In contrast to many statistics books, in which only demonstrative examples can be given, fu:stat:thesis has the purpose of motivating others to apply and understand methods and examples on their own.”
The main drawback of this site for most people is, that is only in German. For all you German-speakers out there, from what I have seen this website has mostly data on social science surveys…
I came across this office thanks to John Oliver’s discussion of the American Healthcare Act and its implications for the American people.
As the name implies, the office annually collects budget and economic data and actually supplies them in Excel files online.
For me, I only used them to play around with the Excel formatting options, as the columns were “wrong”, but I am sure you can get a lot out of this.
If you are serious about learning how to analyse data, lack of data sets are not an excuse. The official bodies of this world alone provide you with large amounts of data on many different topics. Once you get past those, you will still find even more.
On my quest to researching this topic, I came across two more interesting websites that have even more information. Rather than deleting my entire effort here, I rather choose to direct you to those sites, as well, just see the references below.
Do you know of another great website for data sets to practice with? If so, share them in the comment section!
http://www.kdnuggets.com/datasets/index.html – Excellent summary of datasets for data mining and data science
https://aws.amazon.com/public-datasets/ – Amazon web services provide a large amount data sets