Hello, and welcome! This will be the website and repository for our short/mini course on Stata, everyone's favorite statistical package. This course is meant to be for people with little to no previous experience with Stata or programming. The course is being held in May 2015 at Twaweza.

Stata is a popular software used for data management, data cleaning, statistical analysis, and data visualization. While very flexible and powerful, it has a bit of a learning curve. Why? Because - unlike softwares such as Microsoft Excel - Stata has a pretty non-intuitive graphical user interface, and it's instead built to be used via text-based commands. This can be a bit jarring for people that are used to working mostly via the *graphical* user interfaces of computer programs. That said, once you get over the initial weirdness of speaking to your data in *text* rather than *clicks*, you'll see that command-based interfaces can be much faster, clearer and more reliable than working via the drop-down menus.

Another barrier to Stata, beyond the initial learning curve, is cost. Stata is not free; indeed, user licenses start at about $600 for non-governmental organizations in Tanzania. As an alternative, then, other similarly powerful - but free and open source - options are R and Python's Pandas library.

The best - maybe the *only* - way to learn is by doing. This course is built around the idea of being equal parts *gentle guidance* and *hands-on doing*. I strongly encourage you to start thinking of ways to incorporate Stata into your workflow, and start thinking of projects you could use Stata on. The main way to learn Stata is to *keep using Stata*. Once you've mastered the concept of the .do file, and can type help into the Command Editor, you're basically ready to do anything. If you aren't sure *how* to incorporate Stata into a project, let me know and we can brainstorm!

We'll be using three publicly-available Twaweza datasets for this course. Download them here:

- Tanzania, 2012, Uwezo results (2MB, .dta) - This is a simplified version of the Tanzania 2012 Uwezo data. Full (quite large) data available here.
- Tanzania, 2012, Sauti za Wananchi baseline (32KB, .dta) - A simplified version of the Sauti za Wananchi (
*Voice of the Citizens*) baseline data, collected in 2012. Full dataset, as well as a methodology paper, available here. - Tanzania, 2015, Sauti za Wananchi constitution round (54KB, .dta) - Data from the January 2015 round of Sauti za Wananchi, asking the sample about the upcoming constitutional referendum. More info on this round, including questionnaire and accompanying policy brief, available here.

- Presentation (2.1MB, .pdf)
- Homework 1.1 - Learn how to open a .do file, read Stata commands, and find more information about them.
- Homework 1.2 - Use some of the commands you learned to ask basic questions about the Uwezo data.
- Presentation (1.3MB, .pdf)
- An example .do file (ugly)
- An example .do file (beautiful)
- Homework 2 - Write your own .do file. Compute basic statistics.
- Presentation (9.2MB, .pdf)
- Homework 3 - Mastering merges, loops, and locals.
This is just the tip of the iceberg, and the goal of this course was mainly to get you off the ground in terms of the basics. Stata is very powerful - hence its popularity! If you want to continue expanding your Stata knowledge (or if there was anything in the mini-course that didn't make sense), here are some additional resources:

- Official Stata documentation
- Statalist.org - Unofficial Stata help forum.
- Cross Validated – Statistics help forum.
- Stack Overflow ('Stata' filter) - Programming help forum.
- Introduction to Stata - Comprehensive online Stata course, offered by the University of North Carolina.
- Which statistical test should I use? - Cheat sheet for Stata and statistics, prepared by the University of California - Los Angeles.
- Khan Academy: Probability and Statistics

And, of course, when in doubt, just type help!

