getting started with stata

a very short intro, for the good people at twaweza

# karibu!

Hello, and welcome! This will be the website and repository for our short/mini course on Stata, everyone's favorite statistical package. This course is meant to be for people with little to no previous experience with Stata or programming. The course is being held in May 2015 at Twaweza.

xkcd: 722
xkcd: "Computer Problems"


# outline

What is Stata?

Stata is a popular software used for data management, data cleaning, statistical analysis, and data visualization. While very flexible and powerful, it has a bit of a learning curve. Why? Because - unlike softwares such as Microsoft Excel - Stata has a pretty non-intuitive graphical user interface, and it's instead built to be used via text-based commands. This can be a bit jarring for people that are used to working mostly via the graphical user interfaces of computer programs. That said, once you get over the initial weirdness of speaking to your data in text rather than clicks, you'll see that command-based interfaces can be much faster, clearer and more reliable than working via the drop-down menus.

A note about open source alternatives

Another barrier to Stata, beyond the initial learning curve, is cost. Stata is not free; indeed, user licenses start at about $600 for non-governmental organizations in Tanzania. As an alternative, then, other similarly powerful - but free and open source - options are R and Python's Pandas library.

A note about learning by doing

The best - maybe the only - way to learn is by doing. This course is built around the idea of being equal parts gentle guidance and hands-on doing. I strongly encourage you to start thinking of ways to incorporate Stata into your workflow, and start thinking of projects you could use Stata on. The main way to learn Stata is to keep using Stata. Once you've mastered the concept of the .do file, and can type help into the Command Editor, you're basically ready to do anything. If you aren't sure how to incorporate Stata into a project, let me know and we can brainstorm!



# data

We'll be using three publicly-available Twaweza datasets for this course. Download them here:



# course materials

Session 1: Introduction + Basics

Session 2: Your first .do file

Session 3: Big Stata concepts


Virtual office hours

For people attending the in-person mini-course in Dar es Salaam, I've set up a Skype group chat and invited you all into it. If you have any question as you work through the homeworks, just ask me in the group. If you answer someone's question in the Skype group, I will personally buy you a Dot's cupcake of your choosing.

E-mail me if you'd like to be added to the Skype group chat.


Cheat sheet

Let's crowdsource a cheat sheet! As you use commands in your .do files, add them to our class cheat sheet so everyone can learn about it.



# more resources

This is just the tip of the iceberg, and the goal of this course was mainly to get you off the ground in terms of the basics. Stata is very powerful - hence its popularity! If you want to continue expanding your Stata knowledge (or if there was anything in the mini-course that didn't make sense), here are some additional resources:

And, of course, when in doubt, just type help!

getting things wrong
why? whyyy?

Made with love in Dar es Salaam. Background image of the Stata Center at MIT, CC-licensed, by Robbie Shade.