FIVE Project Articles

Murmann, J.P., et. al., 2007. Automatic Coding of Printed Materials.

Abstract: The paper presents a complete method for using automatic techniques to code printed text pages. It involves three automatic steps and one or two steps of manual corrections to obtain fully accurate results. We discovered that present-day consumer digital cameras are much better than high-end scanners to obtain pictures of printed pages quickly and without the wear and tear associated with scanners. We also found that high-end ($370) OCR software is much more cost-effective to achieve accurate text recognition and to process large amounts of data. We also describe how researchers can write a computer program for classifying automatically non-uniform data. We provide detailed instructions for each step in the automatic coding method so that other researchers can readily copy it.

Link to Article

Murmann, J.P., 2007. Constructing Effective Longitudinal Databases on Your PC.

Abstract: The paper presents a strategy for designing longitudinal databases with FileMaker. The approach facilitates efficiency in entering data and flexibility for constructing statistical analyses from the raw data. The key feature of the strategy is to define the basic unit of observation in the database in terms of an agent, an event, and a date. Given that programs such as FileMaker can easily sort data by agent and date, once you structure the data correctly you can construct well-ordered event histories for agents, even if the researcher may enter the data in an unordered fashion. By using events that happened to an agent at a particular time as the basic unit of observation, one maintains maximum flexibility to do statistical analysis that aggregate basic data in different ways.

Link to Article