Author Archives: Johannes Vogel

Twitter feed Download and Analysis Part II: Tweet Management

In my last post[1], I was talking about my approach to downloading tweets from various twitter channels using python[2](version 3.6) and the tweepy library[3]. I would like to expand on this in this post, by looking into the downloaded tweet … Continue reading

Posted in Data Science, Data Wrangling, Pandas, Tutorial | Tagged , , , | Leave a comment

Twitter feed Download and Analysis Part I: Download of Tweets

During the last weeks and months, I have been reading up a bit more on natural language processing. I wanted to apply my newly learnt skills on that topic, but I needed a source of texts to analyze. As a … Continue reading

Posted in Python, Tutorial | Tagged , , , | 1 Comment

Naive Bayes explained naively

To me, reading about the concept of naive Bayes is like following a very logical train of thought. Without really knowing (or caring) whether this is an accurate description, I call something like this a logic chain… and it goes … Continue reading

Posted in Data Science | 1 Comment

Create an automated ML model evaluation report with scikit-learn, matplotlib and python-docx

As my knowledge on the subject of machine learning grows, I ended up writing code for several different models several times over. In order to better evaluate which model performs better, I wanted an automated ML model evaluation report that … Continue reading

Posted in Data Science, Machine Learning, Python | Tagged , , | Leave a comment

Kaggle Learnings – Exploratory Data Analysis & Data Cleaning

I am a strong believer in worked examples and case studies. Theory is all nice and well, but without applying it in a real use-case, it can be quite a pointless exercise. Today on Kaggle, I came across a worked … Continue reading

Posted in Data Cleaning, Data Science, Data Wrangling, Machine Learning | Tagged , , | 1 Comment

Linear Regression in a Nutshell

Explaining linear regression using the ordinary least squares method appears to be a bit of a rite of passage in data science judging by the amount of entries one can find on the web. True enough, it has the same … Continue reading

Posted in Data Science, Machine Learning, Theory | Leave a comment

How to create a stacked barchart with python and matplotlib

When data is being extracted and analysed, this very often falls to people who will not necessarily take decisions based on them. This, typically, means that you, the data analyst, need to present the data in a clear and concise … Continue reading

Posted in Data Science, Pandas, Tutorial | Tagged , , , , | Leave a comment

Review: Pandas .loc vs. iloc

When you use Python (3.6.2) for data analysis, the Pandas library (0.20.3) is typically used to navigate efficiently through your datasets. You select single values, slice the datasets by row or column or transfer a subset of data to a … Continue reading

Posted in Data Science, Python, Tutorial | Tagged , , | Leave a comment

Tutorial: Extracting World Bank Data from CSV using Python and Pandas

One of the most important activities in data science is the extraction and collection of data, followed by the transformation into formats that can be used to analyse and be interpreted. In this post, I will use the Python programming … Continue reading

Posted in Data Science, Python, Tutorial | Tagged , , , | 1 Comment

Where to find useful sample data sets to practice with?

In order to become a good at anything, there is one thing that you need to do and that is to practice, practice… and then practice some more. When that something is data analysis, however, you actually need data sets … Continue reading

Posted in Data Science | Tagged | Leave a comment