data wrangling

List Comprehensions in Python

List comprehensions are a topic a lot of new Python users struggle with. This article seeks to explain the benefits of list comprehensions and how list comprehensions work in a digestible manner. Single for loop list comprehension The following code uses a traditional for loop to change each string in a for loop from upper […]

Ask Better Code Questions (and Get Better Answers) With Reprex

Note: This article was written about version 2.0.0 of the reprex package. In the forums and Q&A sections of websites like Stack Overflow, GitHub, and community.rstudio.com, there is a volunteer force of data-science detectives, code consultants, and error-fighting emissaries ready to offer assistance to programmers who find themselves staring down unhappy code that’s resisting placation. […]

Getting Started with Web Scraping in Python

“Web scraping” or “data scraping” is simply the process of extracting data from a website. This can, of course, be done manually: you could go to a website, find the relevant data or information, and enter that information into some data file that you have stored locally. But imagine that you want to pull a […]

Getting Started with pandas in Python

The pandas package is an open-source software library written for data analysis in Python. Pandas allows users to import data from various file formats (comma-separated values, JSON, SQL, fits, etc.) and perform data manipulation operations, including cleaning and reshaping the data, summarizing observations, grouping data, and merging multiple datasets. In this article, we’ll explore briefly […]

Getting Started with Regular Expressions

Regular expressions (or regex) are tools for matching patterns in character strings. These can be useful for finding words or letter patterns in text, parsing filenames for specific information, and interpreting input formatted in a variety of ways (e.g., phone numbers). The syntax of regular expressions is generally recognized across operating systems and programming languages. […]

Databases for Data Scientists

As data scientists, we’re often most excited about the final layer of analysis. Once all the data is cleaned and stored in a format readable by our favorite programming language (Python, R, STATA, etc), the most fun part of our work is when we’re finding counter-intuitive causations with statistical methods. If you can prove that […]

Creating an SQLite database for Use with R

When you import or load data into R, the data are stored in random-access memory (RAM). This is the memory that is deleted when you close R or shut off your computer. It’s very fast but temporary. If you save your data, it is saved to your hard drive. But when you open R again […]

A Guide to Python in QGIS

This post is something I’ve been thinking about writing for a while. I was inspired to write it by my own trials and tribulations, which are still ongoing, while working with the QGIS API, trying to programmatically do stuff in QGIS instead of relying on available widgets and plugins. I have spent, and will probably […]

How to Use the Field Calculator in Python for QGIS 3

Recently, I have taken the dive into python scripting in QGIS. QGIS is a really nice open source (and free!) alternative to ESRI’s ArcGIS. While QGIS is a little quirky and generally not quite as user friendly as ArcGIS, it still provides nearly the same functionality. Personally, I’ve become a fan of it and now […]

Getting Started with the purrr Package in R

If you’re wondering what exactly the purrr package does, then this blog post is for you. Before we get started, we should mention the Iteration chapter in R for Data Science by Garrett Grolemund and Hadley Wickham. We think this is the most thorough and extensive introduction to the purrr package currently available (at least […]