Budget Tools: Financial Methods in the Public Sector

Books

Greg G. Chen, Dall W. Forsythe, Lynne A. Weikart & Daniel W. Williams

  • Citations
  • Add to My List
  • Text Size

  • Chapters
  • Front Matter
  • Back Matter
  • Subject Index
  • Copyright

    View Copyright Page

    Tables, Figures, and Features

    Tables
    • 2.1 Uniform Chart of Accounts, Personnel-Related Expenses 7
    • 2.2 Uniform Chart of Accounts, Non-Personnel-Related Expenses 8
    • 2.3 Example with Uniform Chart of Accounts, Neighborhood Association Expenses 9
    • 3.1 Costs and Assumptions for Child Care Center 15
    • 3.2 Child Care Initiative Budget 16
    • 3.3 Child Care Budget with Average Costs 16
    • 3.4 Payroll Database 22
    • 3.5 Standard Pay Scale 22
    • 3.6 Assumptions in Exercise 1: Turning Costs into Budgets 34
    • 3.7 Savings in Closing a Chicago Firehouse 35
    • 4.1 Expenditures for Health in New York City, Fiscal Years 1996–2007 38, 64
    • 4.2 Holt Exponential Smoothing 54
    • 4.3 Grid Search for Holt Exponential Smoothing 57
    • 4.4 Trend Analysis: New York City's Investment in Health Expenditures, FY 1996 to FY 2007 63
    • 5.1 Parameters for the Student's Car Purchase 71
    • 5.2 Calculation of Annual Operating and Maintenance Costs of Car Purchase Alternatives 71
    • 5.3 Calculation of Discounted Cash Flows of Car Purchase Alternatives 71
    • 5.4 Solar Heater versus Electric Heater 73
    • 5.5 Cost-Benefit Analysis of Attending Master's of Public Administration Program 83
    • 5.6 Costs, Benefits, Benefit-Cost Ratio, and Net Present Value for Two Hypothetical Projects 83
    • 5.7 Cost-Benefit Analysis with Sensitivity Analysis of the Atlantic Yards Arena Investment 87
    • 5.8 Project Cost Components and Other Key Assumptions 90
    • 5.9 Level Debt 90
    • 5.10 Level Principal 91
    • 6.1 New York City Financial Plan, October 2007 97
    • 6.2 Sample of Table Showing the Effect of Out-Year Spending 99
    • 7.1 Framework for Performance Budgeting 103
    • 7.2 Trend Analysis: New York City's Investment in Health Expenditures, FY 1996 to FY 2007 119
    • 8.1 ABC Fire Department, First Month Variance Analysis, 2007 130
    • 8.2 ABC Fire Department, First Quarter Variance Analysis, 2007 131
    • 8.3 Variance Analysis for San Jose Operating Budget, 2007–2008 133
    • 8.4 Variance Report from the Department of Transportation, 2007–2008 133
    • 9.1 San Diego's Financial Plan FY 2008–2012, General Fund Corrective Action Steps 140
    • 9.2 Balance Sheet for the City of Charlottesville, Virginia, Government Funds Only, June 30, 2006 144
    • 9.3 City of Charlottesville Statement of Revenues, Expenditures, and Changes in Fund Balances, Governmental Funds, for the Year Ended June 30, 2006 145
    • 9.4 Financial Statement for a Small Nonprofit Organization 148
    • 9.5 Statement of Financial Position (Balance Sheet) for a Small Nonprofit Organization 149
    • 9.6 Statement of Activities (Income Statement) for a Small Nonprofit Organization 149
    • 9.7 City of Charlottesville Balance Sheet, Government Funds Only, June 30, 2005 150
    • 9.8 City of Charlottesville Statement of Revenues, Expenditures, and Changes in Fund Balances, Governmental Funds, for the Year Ended June 30, 2005 151
    • C.1 Appearance of Excel in Office 2000, 2003, or XP 186
    • C.2 Appearance of Excel in 2007 187
    • D.1 Sample Table without Formatting 245
    • D.2 Sample Table with Formatting 245
    • D.3 Sample Table with Title in Prominent Position 245
    • D.4 Sample Table with Rules Outlining the Header Rows 245
    • D.5 Sample Table without Formatting and with Embedded Title 246
    • D.6 Sample Table with Formatting, Prominent Title, and Indents 246
    • D.7 Sample Table Showing Rules That Span More than One Column 247
    • D.8 Sample Table with Formatting and Assumptions Following the Table 247
    • D.9 Sample Table with Formatting and Assumptions Embedded in the Table 248
    Figures
    • 3.1 Fixed Costs 14
    • 3.2 Step-Fixed Costs 14
    • 3.3 Variable Costs 14
    • 4.1 Holt Exponential Smoothing 55
    • 4.2 Personal Income Taxes in New York City, with Adjustment for Effect of September 11, 2001 58
    • 4.3 Focus on Bias: α = 0.4 61
    • 4.4 Focus on Bias: α = 0.6 62
    • 7.1 Budget Justification Format 112
    • 7.2 Budget Justification 114
    Exhibits
    • B-1 Example of Five Records from a Flat File from ICPSR Data 161
    • B-2 An Executable Syntax File 161
    • B-3 File Lacking Organized Labels 169

    Preface

    As professors of budgeting, we were frustrated by the limited number of explanations and exercises available for our students' use in traditional or theory-based budget texts commonly used in budgeting courses. Theory is essential in any budgeting course, but students need the ability to translate this information into practice and to feel comfortable reading, analyzing, and creating budgets. Thus, we wrote Budget Tools: Financial Methods in the Public Sector to fill this need.

    Budget Tools teaches students a specific set of skills needed to be successful when creating and tracking budgets, and it focuses on the tools and applications that are commonly used in the budget offices of government agencies and nonprofit organizations. It contains guided examples, data sets, and spreadsheet-based exercises on CD—all class tested and proven to work—and covers a range of topics and skills giving students a hands-on opportunity to master the application of budgeting concepts.

    Although we do not dig deeply into the budget process,* we use the milestones of the budget process to structure our book, which starts with organizing budget data in the preparation phase and ends with multiyear planning and analysis. Every chapter begins with learning objectives and a brief explanation of the topic. We offer plenty of explanation, teaching tips, and step-by-step examples, and we conclude each chapter with exercises using real budgeting data.

    After we introduce the scope of the book in chapter 1, we focus in chapter 2, “Organizing Budget Data,” on learning how to use a chart of accounts. We explain why budgeting and accounting are both relevant to budgeting, and we teach students how to organize data and interpret budget data.

    We begin to explore the tools used in budget preparation in chapter 3, “Preparing the Operating Budget: The Spending Side.” In good times, the executive might ask for new initiatives; in fiscally challenging times, the executive might set targets for reductions and require budget cuts to meet those targets. Thus, we focus on analyzing new initiatives and calculating budget cuts to meet targets. We also discuss approaches to updating our budget base, including a section on projecting payroll.

    In chapter 4, “First Steps in Revenue Estimating,” we turn to the revenue side, specifically to revenue forecasting methods. From the many techniques, simple and complex, available to forecast revenues, we pick one to review in detail. It is essential that students learn to use spreadsheets in budgeting, and to that end we focus on the use of spreadsheets in forecasting.

    In chapter 5, “Preparing the Capital Budget,” we concentrate on the capital budget. We examine three areas of capital budgeting: life cycle costs, cost-benefit analysis, and financing capital needs.

    We have created a short chapter on the financial plan in chapter 6, “The Financial Plan and Budget Decision Making.” Financial plans seem simple, but knowledge of them is essential to the management of an organization, so we take time to understand their importance.

    In chapter 7, “Presenting the Budget,” we discuss the tools necessary to present the budget. We outline several topics, including performance budgeting and budget justifications.

    We explain what kinds of tools are needed to implement a budget in chapter 8, “Implementing the Budget.” We demonstrate how to create an operating plan, and we present ways to conduct a variance analysis.

    In the final chapter, chapter 9, “Multiyear Plans and Analyses,” we discuss multiyear budget plans and analysis. We also spend time on financial statements and present simple tools for analyzing them.

    All these tools would be of little account without a thorough understanding of spreadsheets. We have included five appendices, focusing on Microsoft Excel literacy for those needing a tutorial in spreadsheet basics, along with several guided examples of more advanced techniques such as understanding logic functions and developing macros in Excel. These appendices are “Spreadsheeting Basics,” “How to Produce Clear, Attractive Reports,” and “Advanced Spreadsheeting for Budgeting.” Each includes additional exercises to further increase Excel literacy and to perfect skills in creating and analyzing budgets. These special features and appendices make the book an easy reference guide that could be useful to students during their careers.

    Using the Companion CD and Instructor Solutions

    On the companion CD students and instructors will find data sets, support files, and exercise materials used in the exercises for most chapters in the book and several of the appendices. Users can open the spreadsheets with either Office 2003 or 2007 software. All the spreadsheet files were prepared in Office 2003.

    In some instances, when Office 2007 is in use and the file is saved as an Office 2003 document, there may occur noncritical error messages such as these:

    • The following features in this workbook are not supported by earlier versions of Excel. These features may be lost or degraded when you save this workbook in an earlier file format.
    • Significant loss of functionality. Any effects on this object will be removed. Any text that overflows the boundaries of this graphic will appear clipped.
    • Minor loss of fidelity. Some formulas in this workbook are linked to other workbooks that are closed. When these formulas are recalculated in earlier versions of Excel without opening the linked workbooks, characters beyond the 255-character limit cannot be returned.

    These errors have no functional effect. The user should select “Continue” (an option that will appear on the screen) and save the file.

    Every exercise comes with solutions available to instructors who adopt the book for their courses. To download solutions to Budget Tools exercises, instructors must visit http://college.cqpress.com/instructors-resources/budgettools/ and register for a new account. Access is limited to instructors only.

    Acknowledgments

    During our writing of this book, we have had the support of many people. We are indebted to Kristine Enderle, development editor at CQ Press, for her patience and wisdom. Her guidance has been invaluable. Others on the team—including copy editor Mary Marik, production editor Kerry Kern, and marketing specialist Erin Snow—have been terrific. Many thanks go to John Bartle, University of Nebraska–Omaha; Thad D. Calabrese, New York University; Sang Ok Choi, California State University, Dominguez Hills; Todd Lantin Ely, New York University; Andrew Glassberg, University of Missouri–St. Louis; Glen Krutz, University of Oklahoma; and Daniel L. Smith, Rutgers University–Newark, who reviewed the manuscript and offered valuable suggestions. These reviewers saved us from errors and omissions, yet, as ever, those that might remain are our responsibility alone.

    *For more on the budget process, see Dall W. Forsythe, Memos to the Governor: An Introduction to State Budgeting, 2nd ed. (Washington, D.C.: Georgetown University Press, 2004).

  • Appendix A: Chart of Accounts

    All municipalities in New York State are required to use a standard system for classifying and coding accounting transactions.

    Classification

    A classification of accounts is a systematic arrangement of accounts based upon a definite scheme. The purpose of classifying accounts is to provide a standard format for recording and reporting financial transactions, which allows comparisons to be made with other municipalities or other financial periods. The classification system serves as a basis for budgeting, accounting, and reporting as well as for administrative control purposes, accountability to the Office of the State Comptroller and the general public, cost accounting, and the compilation of financial statistical data on the state level.

    Coding

    Coding of accounts facilitates the classification of data on source documents and the posting of entries in the accounting records. It enables identification of transactions quickly and provides consistency in reporting. The coding system used in New York State is an alphanumeric system—a letter or combination of letters followed by a series of digits.

    The alpha portion of each code, consisting of one or two letters, identifies the fund. The following funds are provided for municipalities in New York State.

    Appendix A is reprinted from Accounting and Reporting Manual (Albany: New York State Office of the State Comptroller, n.d.), Chapter 5, Classification and Coding Structure, http://www.osc.state.ny.us/localgov/pubs/arm/arm5.htm.

    The numeric portion of each code, which immediately follows the alpha portion, identifies general ledger, revenue and expenditure/expense accounts. The same account code number, where applicable, is used in all funds.

    General ledger codes have three digits and are arranged in balance sheet order; assets, followed by liabilities and fund equity.

    100–499Assets
    600–699Liabilities
    800–999Fund Equity

    For example, 200 identifies the asset CASH in each fund. A200 identifies the asset CASH in the General Fund.

    Revenue codes have four digits and are arranged by source (where did the revenue come from):

    1000–2999Local Sources
    3000–3999State Sources
    4000–4999Federal Sources
    5000–5999Interfund Transfers and Proceeds of Obligations

    Each category is further subdivided to better identify the revenue source. For example 2401 identifies INTEREST AND EARNINGS in each fund.

    GAAP requires revenues to be classified by fund and by source. Thus General Fund INTEREST AND EARNINGS would be classified A2401.

    Expenditure/expense codes have 5 digits and are arranged by functional unit and object of expenditure/expense. The term function refers to the primary classification and description as to purpose (what was the purpose of the expenditure/expense). The first four digits identify the function:

    1000–1999General Government Support
    2000–2999Education
    3000–3999Public Safety
    4000–4999Health
    5000–5999Transportation
    6000–6999Economic Assistance and Opportunity
    7000–7999Culture and Recreation
    8000–8999Home and Community Service
    9000–9099Employee Benefits
    9700–9799Debt Service
    9900–9999Interfund Transfer

    Each function is further subdivided to better classify the expenditure/expense. For example expenditure code 1325 in the General Governmental Support function identifies the Treasurer's Office within the financial office of the municipality.

    Expenditures should be further classified by character, that is, on the basis of the fiscal period they are presumed to benefit. The major character classifications are: current, capital outlay and debt service. Character classifications may be accomplished by grouping the object classifications, discussed below, which are subdivisions of the character classification.

    The object of the expenditure/expense (the fifth digit in the code) is a secondary classification and identifies the item purchased or service obtained in order to carry out a function. The object is identified by the fifth and final digit:

    .1Personal Services
    .2Equipment and Capital Outlay
    .4Contractual
    .6Debt Principal
    .7Debt Interest
    .8Employee Benefits
    .9Interfund Transfer

    Code 1325 from above can then be further classified as 1325.4 to indicate Contractual Services within the Treasurer's Office.

    GAAP requires expenditures to be classified by fund, function, character, and object. Thus a General Fund expenditure for Contractual Services within the Treasurer's Office is coded A1325.4.

    The coding of expenditures/expenses may be expanded to include more detail such as department, location, and/or activity accounting. This can be accomplished at the local government level by further expanding the object of expenditure/expense code by adding additional digits. As an example, contractual expenditures (.4) listed above could be further refined:

    .41Supplies and Materials
    .42Utilities
    .43Insurance
    .44Professional and Technical Services
    .45Rent or Lease
    .46Operation and Maintenance
    .47Miscellaneous

    Employee benefits, including such items as the localities share of social security, retirement, and various types of insurance, may be recorded in two ways. The first method would be to use the Employee Benefits Codes (9000–9099) with the .8 object of expenditure. The alternative method would be to charge the employee benefits to the various functional units using the .8 object of expenditure.

    Appendix B: Further Discussion of Data Organization

    In a typical textbook the worst problem you encounter is a word problem that has been intentionally obfuscated to test whether you can untangle it. In the real world, untangling the nature of the problem is merely the first step to analysis. The larger problem has to do with data. It is not necessarily the case that data are not available. Indeed, the data may be too numerous. The problem is in figuring out which data are relevant and making the data ready to analyze.

    Budget circumstances usually do not afford the luxury of collecting new primary data for the purpose of analysis. The only time you will have data collected for you is when you participate in developing a computer database. Even in these circumstances, the data may not come to you until much later, and by then it will begin to have characteristics of secondary data.

    Under most circumstances data should be entered into your analysis data files (formats include Excel, SPSS, or the data files for some other software) in the form of a database. This principle is particularly important when there are many pieces of data or when the data are subject to change. Databases have these characteristics:

    • Columns have unique column labels at the top
    • Columns are adjacent
    • Rows are adjacent
    • There is a column that contains a unique identifier for each row; usually this is the first column
    • Missing data are represented by leaving cells empty
    • No formulas are entered into data cells; in some circumstances a column is a transformation of one or more other columns, in which case the entire column contains the effect of the same formula
    • Different sets of data are entered in different databases; in the modern spreadsheet, this can be done by using a different tabbed sheet for each set of data.

    Here are some of the problems of databases that you will face and some solutions to the problems.

    Indicators vs. Measures

    One problem is that the data may measure something other than what you wanted to measure. Secondary data may not measure what you want to measure. Instead, you may need to use indicators, that is, indirect measures of what you want to measure. For example you may want to measure how warm the temperatures are in apartments that are in a certain apartment complex. However, all you may be able to measure is the BTU output of the heating unit. Because of many other factors, such as the insulating efficiency of the windows, the BTU output does not fully measure heat, but you can be relatively sure that as BTU output goes up, heat goes up. The BTU output is an indicator. An even weaker indicator, but still an indicator, is the fuel consumption of that same heating unit. In contrast, periodic readings from thermostats in the apartments are actual measures.

    Often, when using secondary data, you must opt for indicators because no one has made a data source available with data covering what you want to measure. It is important that indicators be realistically associated with what they are expected to indicate. Association between indicator and actual measure may be in four forms:

    • The indicator may be thought to be strongly linked to the cause of the variable you want to measure. For example, the BTU output measures a major component of the cause of apartment heat.
    • The indicator might be thought to be caused by the variable of interest. For example, housing quality might be caused by income and, thus, might be an indicator of income.
    • The indicator and the variable of interest might have a common cause. For example, height and weight, at least among younger individuals, share common growth causes. So height may be treated as an indicator of weight.
    • The indicator and the variable of interest may simply be highly correlated. For example, locus of residence in the United States is often treated as an indicator of political affiliation. In this last case, both in the example and the practice, the indicator is highly risky. The practice is particularly risky as unexplained correlations can end abruptly without explanation.

    Exercise 1. The “Exercise 1” tab in the “Exercises Appendix B” file contains several variables from the 1990 census for the twenty largest cities in the United States. One variable it does not contain is the proportion of the population that lives in apartments or other mass housing. Choose the variable among those available that is the best indicator, and rank these cities from highest (1) to lowest (20) proportion living in apartments based on this indicator.

    Data Are Not Ready

    Data that are not ready can take several forms. Data may be presented to you on paper, but you may need the data electronically in order to carry out analysis; data may come to you in a “flat” (text) file, PDF file, or in a file format that is unfamiliar; or you may have variable identifiers and characteristics that you must somehow associate with the data.

    Data presented on paper are actually not a problem if they are on a few pages, but they are a problem if they are on dozens or hundreds of pages. Flat files, PDF files, and other readable but unfamiliar file formats may be very similar to data on paper. The recent development of PDF has made very many more reports easily accessible but has in some ways reduced access to processable data.

    Paper Files

    When paper files are a few pages long, the easiest way to make them electronic is to manually enter them into the analysis entry software. After the number of pages has exceeded an entry tolerance level, which can only be determined by the analyst with whatever knowledge of resources he or she may have, the solution is to use a scanner and high-quality optical character recognition (OCR) software. Even high-quality OCR software will introduce errors. Consequently, the converted data must be examined for apparent errors. In addition, any striking, anomalous, or otherwise surprising findings should lead back to the data for verification.

    Handwriting generally cannot be converted by OCR, so handwriting has to be entered.

    Exercise 2. Enter the table below into an otherwise empty Excel spreadsheet. The totals should be computed, rather than entered. You will know that you have entered the correct data when totals match the totals in the table.

    Flat Files and SAS, SPSS, and Similar Programs

    It has long been the practice with SAS, SPSS, and possibly other sophisticated data analysis programs to supply data sets in flat files with files that contain processable records separate from those that contain data definitions and characteristics. Sometimes the analyst must take numerous steps to merge the data set with the data definitions. Sometimes the analyst does not have access to any of the particular programming options that allow merging and must, consequently, work straight from the flat file and the data definitions.

    Instructions in use of simple SAS or SPSS executable syntax files can be found on their respective Web sites (http://www.sas.com and http://www.spss.com). It is not the purpose to repeat those instructions here, particularly as this problem is gradually fading away.

    Examples of a flat file are shown in Exhibit 1 for this appendix and a syntax file is shown in Exhibit 2. These have been downloaded from Interuniversity Consortium for Political and Social Research (ICPSR, http://www.icpsr.umich.edu). Only a few lines of the flat file are shown. The syntax file must be opened through the SPSS open syntax menu and requires editing before it can be used. Editing usually means that the path of the file that contains the data and the desired final filename must be added. In the example file there is a bolded statement that says:

    DATA LIST FILE=“physical-filename” /

    The portion in quotes is the portion that needs to be edited by specifying where you have put the file on your computer. The following instructions apply to Microsoft operating systems. After you have edited “physical-filename,” it may look something like this:

    DATA LIST FILE=“C:\Documents and Settings\Your name\My Document\SPSS\MyFile.txt”/

    The term “Your name” must be replaced with whatever identification you use with your computer. If you do not supply a user login and you are the sole user of the computer, the default is “admin” or “administrator.”

    The term “MyFile” should be replaced by the actual file name.

    At the very end of the “syntax” you should add:

    SAVE OUTFILE= “C:\Documents and Settings\Your name\My Document\SPSS\MyFile.sav”.

    EXECUTE.

    The bolded components should be edited. The second component requires the name you want to use.

    If you want to use the data in an Excel spreadsheet, you should follow through with these lines. (You will see, below, that there are restrictions on what you can save from SPSS to Excel.)

    SAVE TRANSLATE

    OUTFILE=‘H C:\Documents and Settings\Your name\My Document\MyFile.xls’

    /TYPE=XLS /VERSION=8 /MAP /REPLACE /FIELDNAMES

    /CELLS=VALUES.

    EXECUTE.

    Again, the bolded components should be edited.

    An example Excel file, using the data found in Exhibit 1 and the code in Exhibit 2 with the modifications discussed here, is shown on the accompanying CD in the file named FlattoExcel.xls.

    These are instructions for SPSS. SAS is slightly different. For both of these programs, every single mark is relevant, and it is essential that the reader visit the SAS Web site.

    Sometimes other editing is required. These skills usually can be acquired with just a few days of experience working with the software.

    Sometimes flat files come with data labels on the first row and data in columns that are “beneath” the labels (on a wide enough sheet of paper). This may provide for simplified data importation. Or, if there are fewer than 256 variables (16,385 variables for Excel 2007), it may lead the analyst to import data through spreadsheet procedure.

    This problem is gradually fading away as it is mostly associated with older data, collected when data storage was much more expensive than it is today. However, there remains a possibility that an analyst will have a data set that will be described by number of characters to a record and the number of characters, within a record, for each variable.

    After the file is imported into SPSS, it can be exported to Excel format with certain restrictions. The most important of these are that Excel supports a maximum of 255 columns (variables) and a maximum of 65,536 rows (records) (Excel 2007 supports 16,384 columns and 1,048,576 rows); other further restrictions might include your computer's memory capacity.

    Exercise 3. Using SPSS and using the files labeled Syntax.txt and Flat.txt on the CD, edit Syntax.txt as necessary and import Flat.txt into an SPSS (.sav) file, then into an Excel (.xls) file.

    Flat Files and Microsoft Excel

    Flat files can also be imported into spreadsheets with some restrictions:

    • There must be no more than 255 variables (16,384 for Excel 2007)
    • The flat file must have one continuous row of data for one record of data, and the row must break at the end of the record
    • It is highly desirable that the data be delimited
    • It is highly desirable that the first row of the data contain labels that are also delimited.

    Delimited means there is a clear, unique, and not-otherwise-used mark at the end of each variable or variable label. A CSV file is a “comma separated values” file, which refers to a file in which data are delimited by commas. Other common delimiters are spaces and tabs. Any clearly identifiable delimiter will do, although CSV files enjoy especially easy treatment by Excel.

    If the data are in a CSV file, Excel will automatically import the data into a spreadsheet, treating each comma as demarking a new cell.

    It is relatively easy to make a CSV file. For example, take the data in the following brackets {3,5,7,8,9,2}. Open Notepad on your computer. Type the data, omitting the brackets but including the commas. Save the file, and at the end of the file name type “.csv” (omitting the quotation marks). Be sure you know where you saved the file.

    Now open Excel and use the open file icon or use the file menu item to open a new file. Use the drop-down menu to select “all file types.” Find the file and open it. Excel should automatically open it as an Excel spreadsheet. Now save the spreadsheet. Excel will provide a message asking you whether you want to save it as it is (not an Excel format). If you select no, it will proceed to menus that lead to converting the file to an Excel spreadsheet.

    If, at the beginning, you save the file with an extension .txt instead of .csv—Notepad will default to .txt if you do not specify .csv—you can proceed through pretty much the same steps except that when you open the file Excel will do one of two things:

    • It will open in a menu that requires you to tell it which delimiter is used. The menu is intuitive; you should select comma.
    • It will import the whole string 3,5,7,8,9,2 as text.

    If the second event happens, you can call up the menu that allows you to convert the string into cells of data. First highlight the cell or cells to be converted, next select “Data,” then select “Text to Columns. …” The menu called up will permit you to tell Excel which delimiter separates the data. In Excel 2007, the “Text to Columns” icon is found on the “Data” ribbon.

    The process just described also applies to other delimiters such as columns, spaces, or other characters.

    The user will have difficulty importing text data into Excel if:

    • The data do not have column labels; so after the data are imported, they still need to be identified.
    • The importation disorganizes the labels with respect to the data in some way; this is a relatively high risk when text or html reports are posted on government Web sites. The user must go back to the visible report and validate that the labels are properly aligned with the data. Exhibit 3 demonstrates this difficulty. When the authors downloaded it from the Bureau of Labor Statistics and imported it into Excel, the default settings imported June, July, and August into a single column. The authors had to go back and reset the columns (the menus are relatively intuitive) to get the desired results. When the authors did this, the headings at the top of the table were broken into several pieces when some parts became uninterpretable. The result, with its broken heading rows, is shown on the accompanying CD in the file called “cpiai.xls.” The user must be prepared to either edit such files or copy and paste the headers after importing the data rows.
    • There are no reliable delimiters, or they change often throughout the source document. The user must make the text-to-columns conversion in tedious block-by-block steps.
    • Delimiters are omitted when data are omitted. When the data set is relatively small, this means the data set must be carefully edited after conversion. When the data set is large or very large, it is worthless.
    • The data are delimited by position only. This is the condition of some of the data demonstrated in Exhibit 1. That is, there are long strings of numbers where different numbers refer to different variables, and the only way to know which number refers to which variable is to know its position in the data set. The text-to-columns menu allows for separation of variables by position. It is important, in this case, to be very careful. In all likelihood, column labels will have to be added after importation. Also, this is a tedious process.

    Exercise 4. Download the most current file at the Bureau of Labor Statistics: ftp://ftp.bls.gov/pub/special.requests/cpi/cpiai.txt and import it to Excel. Make sure all columns of data are properly labeled with the correct column headings. (For those who do not have access to the Internet and who want to perform this exercise, there is a February 20, 2008, version of this file, called cpiai.txt, on the disk.)

    PDF Data

    There are two kinds of PDF data. There are PDF graphics-only data and PDFs with underlying text data. Some PDF files are locked so that access to the underlying text data is inaccessible to the end user, although some off-brand PDF programs may ignore the locking mechanism. Whenever possible, the PDF program should be used to export the PDF text data to any permissible text format, where it can then be imported into Excel, if necessary. Under almost all circumstances, however, ABBYY PDF Transformer (a commercial product1) will extract the text data and a relatively high quality OCR of the graphics data and can save it as an Excel spreadsheet. It can also transform graphics-only text, which PDF programs may not be able to extract. If the files are huge, the conversion should be broken into parts.

    Acrobat, beginning with version 4, offers “Column Select” and “Table Select” tools that make copying data simpler when there is underlying text. The availability of this option depends specifically on the precise nature of the Acrobat or Acrobat Reader software you may have. When you have the column select, and especially the table select option, it may be possible to copy an entire column or table to a spreadsheet without going through intermediate steps.

    Exercise 5. This exercise requires Adobe Acrobat Standard or Adobe Acrobat Professional. The precise process depends on the precise version you have (version 5 or higher, Standard or Professional). It will not work with PDF Reader, but it might work with alternative brands or PDF extractor programs.

    There is a file provided labeled “censr-16-page-4.pdf.” This is an extract from a report found at the U.S. Census Bureau site about the 2000 Census, Population and Housing, at http://www.census.gov/prod/cen2000/index.html. It is specifically a Census Special Report on “Areas With Concentrated Poverty.” The download file URL is http://www.census.gov/prod/2005pubs/censr-16.pdf. Open the page 4 extract with Adobe Acrobat Standard or Professional. Using the Menus, save the file as text file, .txt. If given a choice, select Plain Text. From Excel, import the file using the procedure previously used for .csv and .txt files. You will find it necessary to delete the text-only material and to perform considerable reorganization. After some editing, most likely considerably less effort than keying the whole table from paper, you will have the table in the same organization within a spreadsheet.

    If your software supports “Table Select,” you can substantially reduce the effort using the “Table Select” feature. The image at the top of the next page shows the “Table Select” menu. First highlight the entire table and nothing else. Second, right click to call up the table select menu. Then click on “Copy As Table” (to paste in a spreadsheet) or, for later versions such as this one, click on “Open Table in Spreadsheet.”

    Poverty Thresholds (Annual Dollar Amounts) by Size of Family and Number of Related Children Under 18 Years Old: 1999

    If you have access to ABBYY PDF Transformer, you can directly import this table into an Excel spreadsheet. Results may still require some editing because, although PDF Transformer reads graphics and performs OCR transformation, it does not use any underlying text information. With this particular file, the authors found approximately equal amounts (but different kinds) of editing for the file with the two alternate text-oriented processes, but it was considerably less effort using the “copy as a table” process.

    File Conversion

    Briefly, SPSS, SAS, and many other programs are able to export files to either Excel format or to CSV format. Excel is able to import data from most databases (Access, Dbase, and others). For more sophisticated analysis, SPSS, SAS, and other powerful programs are able to import appropriately formatted data from Excel.2

    Appropriate formatting strictly follows a proper database design, which is to make one record (row) for each observation unit and one variable (column) for each characteristic, while labeling each row with a variable name. It has been the authors' experience that SPSS does not respond well when the first data row (the first row following the labels) has any empty cell reflecting a missing value.

    As with SPSS and SAS, these instructions assume that you will have other resources for developing skills for those purposes. These are instructions for getting data from those files into Excel for the kind of analysis typically performed in budgeting.

    Exercise 6. There is a supplied file, Earmark.mdb (an Access Database file). It is paired with another file, Earmark.dbf (a Dbase file, that is used, in this case, by Access). This exercise requires first opening Earmark.mdb with Microsoft Access and using the tables feature, updating the location of Earmark.dbf to the location on your computer. Using Excel, use the open menus to open files of ALL TYPES. Find the file “Earmark.mdb” or “Earmark” with an Access database icon. Open it. A menu item will ask you whether you want to open the query to external data. Click “Yes.” The result should be two columns. The first column is a list of entities who were reported as receiving earmarks in the database as downloaded from an Office of Management and Budget site, http://earmarks.omb.gov/download.html, on June 14, 2007. (This database was actually in a more user-friendly format and was converted to first a Dbase database and then an Access Query to create this exercise.) The second column is the reported total amount of earmarks that the recipient received.

    Note that Excel can import entire Dbase or Access tables or it can import query results. Queries are used to summarize one or several columns keyed to some other column or columns. Where the Dbase or Access tables are small, it may be more effective to import the entire table. Where they are large, it is more likely that a query should be used as the first step to reduce the size of the file to be imported into Excel. If Access query functions are inadequate, another approach is to import the database into more powerful analysis software such as SPSS, SAS, or STATA and perform analysis there.

    Data Require Collecting or Editing

    The next set of data problems that can arise is not about access to the data, but they are problems with the data themselves:

    • You might collect the data in pieces and have to build records.
    • You might have multiple sources of data that are similar but with significant differences without a clear indication of which is more reliable.
    • The data records may be missing data or may have uninterpretable values.
    • The data records may have values that are extraordinary and improbable, but not subject to validation.
    Data from Multiple Sources

    A typical problem for practitioners is that before you can analyze your records you must construct your records. This does not mean that you must make observations and collect primary data. Instead, the observation units may be the frequent subject of reports, but the reports may be spread around to different locations.

    For example, if you are interested in the population of your jurisdiction and other jurisdictions in your state, you will find data on the Web site of the U.S. Census Bureau. You may be interested in the local Consumer Price Index (CPI), which you will find on the Web site of the Bureau of Labor Statistics. You might be interested in median home values, which is information you can find from your own real estate appraisal office and, for other jurisdictions, possibly from a statewide report. We can go on adding variables of interest, but the point is that the data are not especially difficult to obtain, but they may not exist in a pre-existing data set.

    There can, however, be difficulties with such data, and these difficulties may be subtle. One of the difficulties has to do with careful attention to observation units. The observation unit may vary from data source to data source, or even from variable to variable with the same data source.

    For example, for the population, the Census Bureau can give you the actual population for the most recent census year, an estimate for the prior year, and a forecast for the current and future years. The population can be for the jurisdiction or for a smaller area known as a census block. The CPI, on the other hand, is available at the state level and for metropolitan statistical areas, which are usually multijurisdictional. A careful review of CPI material also reveals the frequent use of the word “urban.” Non-urban jurisdictions are not represented in the data.

    Thus, one way that observation units can vary is that the locale itself can vary from one data source to another. In some instances there may be at least a small leap of faith in believing that the data represent the locale at all. The user must be thoughtful in determining how this variation in observation unit impacts any analysis that may be conducted.

    The second substantial source of variation is time. For example, a jurisdiction may have a July through June fiscal year and report all of its data consistent with this time period. If that jurisdiction is in Texas, the state fiscal year is September through August; or if it is in New York, the state fiscal year is April through March. State data will be reported consistent with that time period. The federal government and some other states have an October through September fiscal year and usually report data consistent with that time period. Some data are reported according to the calendar year. Thus, when data are reported as 2010 data, one must ask: Which 2010? Is it 2010 ending in April, June, August, September, December, or some other date? Treating data ending in periods far enough apart as parallel can confound some sorts of analyses. It may be necessary to make an adjustment. For example, if state data end in June and federal data end in September, it may be necessary to adjust the federal data by moving a quarter of each year's data forward to the next year.

    The third substantial source of variation is the variation among the sources themselves. This problem is a bit tricky. The difficulty is that different information sources might be of different reliabilities and, thus, not agree with each other. Estimates of the number of homeless in U.S. cities differ by a factor of 100 among different sources. Typically, but not always, any official government statistic or estimate is likely to be less biased than estimates provided by advocacy or taxpayer groups.

    Here are a few guidelines that are somewhat helpful:

    • When possible, examine methodology descriptions to know exactly what published data are supposed to communicate.
    • Prefer data supplied with supporting methodology descriptions to data supplied without.
    • Prefer official government data unless you have specific reasons to be suspicious.
    • When you have several competing sources of data and no clear reason to prefer one to another, average them. There is considerable evidence that averages of certain kinds of forecasts are less inaccurate (in the aggregate) than any specific forecast regardless of reason.

    Exercise 7. This exercise requires visiting several Web sites, so it is a bit complicated. The first site is the U.S. Census Bureau at http://www.census.gov/Press-Release/www/2002/demoprofiles.html. The Census Bureau has been known to reorganize its Web site, so an example data set (for the state of Virginia) has been downloaded to the supporting disk; details are at the end of the exercise.

    • At the Census Bureau Web site, click on the link for All Files beside the heading FTP Download (the direct URL to this location is http://www2.census.gov/census_2000/datasets/100_and_sample_profile/). Click on a selected state. This might be your state of origin in the United States, or, if you are not in the United States, a point of entry that you might use, or a state designated by your instructor.
    • Download two files.
      • The first file is labeled “2kh[##].pdf,” where [##] stands for two digits. In the case of Virginia, it is 2kh51.pdf (on your disk); for New York, it is 2kh36.pdf.
      • The second file is labeled “ProfileData[ST].ZIP,” where [ST] is replaced by a state abbreviation, such as OH for Ohio or VA for Virginia (on your disk it is ProfileDataVA.ZIP). You need the PDF file to interpret the .csv files found inside the .zip file. Microsoft XP or higher should open the .zip file without a supplemental program. Inside the .zip file, there are four .csv files.
    • Import into Excel the one labeled “2kh[##].csv” (without a supplemental numeral at the end of the file name). Depending on your computer conditions, you may first have to extract the .zip file into a non-zip directory.
    • With the PDF file as a guide, locate the column that contains the community identifiers (names of jurisdictions) and the overall population of those jurisdictions in the 2000 census.
    • Save the file as a .xls; then (a) delete all other columns, (b) insert a row at the top of the file to column headings for the two columns that remain, and (c) provide brief headings for these two columns. Suggested headings are “Locality” and “Population.” Finally, save the file again. The purpose of the first save, with the alternate file name, was to avoid damaging the original file if you needed to go back and use it again.

    The second Web site to visit provides a list of metropolitan and micropolitan statistical areas (MSAs): http://www.census.gov/population/www/estimates/metrodef.html. The Census Bureau has a .txt version of this file at http://www.census.gov/population/estimates/metro-city/99mfips.txt (this version is also included on the CD and is labeled “99mfips.txt”), and a variety of Excel, PDF, and CSV files containing MSA information can be found at http://www.census.gov/population/www/cen2000/phc-t29.html. In addition, many other reliable Internet sites list MSAs. The current Census Bureau list at time of publication is on the disk as cbsa-01-fmt.csv and as CBSA03_MSA99.xls. For your chosen state, identify every locality that is in an MSA.

    The third Web site is the Census Bureau's statistical abstract on consumer prices at http://www.census.gov/compendia/statab/2006/prices/consumer_price_indexes_cost_of_living_index/. The table listed as “Table 709. Cost of Living Index—Selected Metropolitan Areas” shows the relative cost of living of different communities. (This is a list of selected communities. A commercial product that is more complete is sold by the Council for Community and Economic Research, http://www.c2er.org/.) For the identified MSAs in your selected state, key the index number by all the localities within the MSA. For all other communities, average the value 100 and the value of the nearest MSA to produce an estimate. If no locality in your state is as high as 100, then for other localities average the two nearest MSAs.

    For the sake of simplicity, this assignment stops here. The objective is to experience the search, download, and organize activities of collecting data from relatively friendly secondary data sources. Actual use of secondary data may be easier or considerably more difficult.

    Data Require Editing or Imputation

    When observation values are missing, uninterpretable, or improbable, the data may require editing or imputation.

    If the values are missing or uninterpretable and there are numerous observations, then the substantial risk that one experiences is that the missing values may incidentally lead to a bias in any subsequent analysis. How does a bias arise?

    First, the missing values themselves may be relatively inconsequential. However, during analysis with software such as SPSS the variable may seem somewhat interesting. The reason the value is missing for some observations may result from a biasing characteristic of the data. For example, people with other specific characteristics may decline to respond to ethnicity queries, thus leading to a missing or missing equivalent value in an ethnicity variable, and the ethnicity variable may be thought to be interesting to the analysis. If this variable is introduced into the analysis, it would have the effect of biasing the analysis. The reason will be that almost all software solves the missing-data problem by excluding records that have missing values.

    Thus, one must be very sensitive to the effect of including variables with missing values on statistics of interest in other variables in the analysis. The first step is to watch the impact on the number of observations. If, with the introduction of a new variable, the number of observations that are included changes, the variable has missing values in records that did not have missing values prior to the introduction of the variable. Consequently, the analyst should determine the conditional mean and standard deviation of all the other variables, considering only those records that are included in the analysis with whole records after the introduction of the new variable, and compare them with the same two statistics before all whole records before the introduction of the new variable.

    If there are any substantial changes in values, these changes, not the increased analytic effect of the additional variable, likely explain any changes in the analysis. In this case, remedial action must be taken. Remedial action may be to exclude the variable that introduces the bias. There are additional remedies as well:

    • The best thing to do for missing values is to find the actual value and replace the missing value. In most cases, however, this is not possible.
    • When data occur in a time series, the missing value can be interpolated through one of these processes:
      • If the data are not seasonal, the several observations before and after can be averaged, with the result substituted for the missing value. If the data show little randomness (move around very little), it is probably enough to average one observation each before and after. If there is a lot of randomness, it is probably best to include two or three observations from each period, before and after.
      • If the data are seasonal (move in an obvious annual pattern), the same averaging approach can be used, but the observations should be taken from the year before and the year after.

    Exercise 8. In the “Exercises Appendix B” spreadsheet, look at the tab labeled “Time Series.” Interpolate the missing observation using the recommended method.

    With cross-sectional data, the missing value can be interpolated through one of three approaches. The most reliable way is to find other variables in the data set that correlate with the variable that has missing values and build a regression or other correlation-based model to predict the value for those records where the value is missing. Substitute the model prediction for the missing values.

    The next best method is to choose, based on at least quasi-expert opinion, five to ten other key variables in the data set, then match the observation with other observations using the missing value with these other variables. Using the matched records, do one of the following:

    • Calculate the mean of the variable with the missing value and use it as a substitute,
    • Use some other value such as the mode or median (particularly relevant for categorical or ordinal data), or
    • Randomly choose a value (using random numbers, not haphazard methods) from among actual values observed.

    If, for whatever reason, these methods are not available or fail (for example, no correlates and no matches), use one of the following:

    • The mean of the variable,
    • Some other value such as the mode or median, or
    • A randomly chosen value from among values actually observed.

    Exercise 9. There is a tab in the “Exercises Appendix B” spreadsheet labeled “Federal Data.” Find at least two ways to interpolate the data.

    Values Are Improbable

    Sometimes the problem is not that the data are missing or uninterpretable, but the value itself causes concern. In this case, you might suspect that the value is an outlier, that it is an extreme value with respect to the other values.

    One should keep in mind that, by definition, three standard deviations around the mean contain 99 percent of a data set, so with 100 observations, one should be outside three standard deviations. With a small data set, having a value outside three standard deviations from the mean is evidence on the face of it that it is an outlier. With a larger data set, the value should be substantially outside three standard deviations from the mean to be an outlier.

    The correction for an outlier is to move it to three standard deviations from the mean on the same side of the mean as it already is. For time series, it should be moved to three standard deviations from a relatively short, for example, twelve or twenty-four period, moving average.

    Exercise 10. On the tab labeled “Federal Data” in the “Exercises Appendix B” file, the District of Columbia observations appear unusual. Determine whether each is an outlier. If so, determine the appropriate replacement value.

    Sensitivity Analysis

    When your data set has interpolated values (or any random values are used) and there several ways to interpolate the data, largely with cross-sectional data, it is likely worthwhile to make several alternative interpolated data sets and to conduct the same analysis with each set to determine the effect of the interpolation. When an outlier has been corrected, an alternative scenario can be tested without the outlier correction.

    The use of multiple options or scenarios is called a sensitivity analysis. The purpose is to determine how much the final result you have is dependent on the specific choices you made during the process of analysis.

    Exercise 11. In the “Exercises Appendix B” file, for each of the four variables in the tab labeled “Federal Data,” determine the average value for each alternative value used (missing value not changed, each interpolated value, both with and without any substitutions for outliers).

    Data Definitions Change

    You may know, or suspect, that the definitions of the variables have changed at specific points in time, or gradually over time, or at some time in the past although you are not sure when.

    One of the more difficult problems with data is that the meaning associated with the data elements (variables) can change over time. For example, a jurisdiction may consolidate or lose territory as other jurisdictions are formed. Carelessness may incorrectly lead to the impression that associated characteristics—for example, area or taxable real property—do not substantially change when in fact they do.

    There are many ways this problem may occur, and for long running data series it may be more common than the alternative, that there is no change in data definitions. It may take subtle attention to your data series to discover such changes. Following are some examples that may contribute to changing data definitions.

    A series may be an aggregate of many smaller series. Over time, the smaller series may have additions and removals, thereby changing the definition of the data. For example, public employees operating ambulance and ambulance-like services may be moved from the public safety category to the public health category of a budget report. Consequently, the growth in public safety will be understated and the growth in public health will be overstated. The corrective action is to identify the employees and associated expenditures for the periods prior to the shift and adjust the amounts back to the beginning of the data series, thereby correcting the basis of the reporting.

    An actual similar case the authors experienced involved Medicaid payments to physicians. During the early 1990s the federally based coding system for physician services was substantially changed. Consequently, physicians were forced to choose new codes for services they had long performed. These new codes led to “up coding,” where physicians chose the higher paid of two codes in many instances where there was a choice. Consequently, following the change in the coding system, Medicaid expenditures were substantially higher than before the change. The underlying dynamic was a redefinition of the services that led to changing expenditures.

    Another way a definition can change results is from deliberate change of the whole series definition at a point in time. The telltale sign of this sort of change is an abrupt change in the quantities associated with the series. Public policy changes sometimes have the effect of such abrupt changes by adding new people to service groups or adding new services for people in service groups. Well-documented time series may record these punctuated shifts.

    A third way that a definition can change over time is from gradual definition shift. This sort of change may be the most difficult to discover. Gradual shift may result, for example, when data definitions are poorly documented at the beginning of data entry and updated with practice itself or as staff or management changes. Gradual shifts may be very difficult to discover.

    Exercise 12. In the “Exercises Appendix B” file, on the tabbed sheet labeled “Terrorist,” identify the year or years when the data may have experienced definition changes.

    Analyzed Data Are the Only Data Available

    You may receive previously analyzed data when you would prefer to have the records that were used to produce those analyses.

    Perhaps the most frustrating data problem is to find the information you want, but not quite. You may be looking for information on people of a certain demographic, perhaps by gender, age, or economic strata. But what you find is a report that has more general information on a larger population. It includes the data you want, but also other people as well. The same result can arise with undesired service units, undesired service locations, or any other excess individual observation units.

    What you, as the analyst, would like is the data set upon which the report you have found is based, along with an assurance that there is a variable that records the distinguishing characteristic you are interested in. Your first option might be to go to the source of the report and ask for just that. Often, however, pursuit of that option will be unsuccessful. What else can you do?

    Rather than less analysis, what the report now needs is more analysis. Typically, what this additional analysis involves is ratios (also called proportions or percentages). It is likely that the ratio or ratios will come from other sources than the report you already have in hand. Using the ratios, you break down the information in the report to find the part you are interested in.

    The ratio-breakdown method may be inexact and lead to substantial errors. Thus, it may be useful to also apply a sensitivity analysis. Alternative scenarios can be made by shifting ratios a tenth of their own size.

    Exercise 13. In the “Exercises Appendix B” file, using only the information on the tab labeled “Federal Debt,” find the gross domestic product for all years listed. Is a sensitivity analysis needed? Why or why not?

    Sometimes pre-analyzed data are perfectly acceptable except for one difficulty. For whatever reason, the organization of the information as presented in the original report obscures the information you are looking for. Following is a very simple example: You may have a report that has two simple tables. The first table shows the number of children (under age nineteen) who are receiving a service and the percentage who are female. The second table shows the number of adults receiving a service and the percentage who are male. See the following example tables:

    You may be tasked to determine, among males, what proportion are children. These two tables contain the information, but you must analyze the data to find it.

    First, you take the table about children and multiply 250 * 0.46, then subtract the result from 250. The result is

    This is the number of male children served. Next, you multiply 400 * 0.35 to get

    This is the number of adults served. Then you add the two results to get 275, which is the number of males served. Dividing 135 by 275 results in

    So, the information was there, it simply had to be teased out.

    Exercise 14.

    • For the hypothetical information given in Exercise 13, find the proportion of female children to all females.
    • For the hypothetical information given in Exercise 13, find the proportion of males for both children and adults together.

    Where Can You Find Data?

    You might be confident that there are relevant secondary data, but you do not know where to look for the data. There is no comprehensive solution to the problem of not knowing where to look for data. Following are some Web sites and strategies.

    Federal Government Data

    The most comprehensive source of federal government data is http://www.fedstats.gov. This site is intended to be a reference site for other federal government data sites. It is, however, aimed at the popular consumer. So, if you know what you are looking for, it is easier to go to the source site more directly.

    Federal data related to budgeting may be found at http://www.census.gov (census data), http://www.bls.gov (consumer and producer price indexes), http://www.bea.gov (various economic data), and http://www.whitehouse.gov/omb (particularly official discount rates for present value calculations, but also other miscellaneous OMB publications and information). Other federal government sources are too numerous to list, but one that may not be apparent is the Central Intelligence Agency's World Factbook at https://www.cia.gov/library/publications/the-world-factbook/index.html.

    State and Local Data

    States generally post great quantities of data at their Web sites, which are generally found by www.[ST].gov, or www.state.[ST].us. For example, http://www.NY.gov gets the New York state Web site, while http://www.state.VA.us gets the Virginia Web site. State Web sites are too diverse to document here. The best approach is to begin at the top level Web site and use the menus to explore for the data you are looking for.

    The official Web site format for local governments follows one of these formats www.ci.[CITY].[ST].us, or www.co.[COUNTY].[ST].us, or www.town.[TOWN].[ST].us. Some school systems use www.[CITY/COUNTY].k12.[ST].us. However, this practice is not universal, and there is much duplication in school district names, so the user will have to discover the particular practice within a particular state. Often local jurisdictions will register other domain names that end in .org or .gov.

    Data access and availability are highly variable at the local government level. When looking for local government data, however, one should not omit looking at state or even federal sources. A good state source is the particular local government commission. A good federal source is the Census Bureau. Almost all federal agencies collect and widely publish data, so for particular kinds of questions, one should examine the Web sites of the relevant agencies. Often federal agencies publish contact information for requests for more extensive data.

    Quasi-Private Sources of Data

    Some of the more important sources of data on state government include:

    Other Sources of Data

    Governmental Accounting Standards Board, http://www.GASB.org. This site contains little data, but it is worthwhile to know about. It contains a subordinate page, http://www.seagov.org/index.shtml, which lists numerous performance measurement links. Some of these may contain data.

    These are, of necessity, only partial listings with emphasis on sources that may have financial information. The ICMA and the GASB listings are of special interest as they may contain particular comparative information for performance measurement data or other useful material on performance budgeting implementation.

    Notes

    1. The authors are aware of no other similar product, although such products may exist.

    2. Stat/Transfer can make conversions across many statistical, spreadsheet, and database platforms almost instantly. For more information, please see http://www.stattransfer.com/.

    Appendix C: Spreadsheeting Basics

    The objective of this appendix is to prepare students to use spreadsheets in budgeting. It is divided into two parts. The first part is for those people who have never used a spreadsheet or who have little understanding of what a spreadsheet is or does. The second part reviews some principles for using spreadsheets that maximize the usefulness of spreadsheets.

    APPENDIX C, TABLE 1: Appearance of Excel in Office 2000, 2003, or XP
    APPENDIX C, TABLE 2: Appearance of Excel in 2007
    Principles for Use of Spreadsheets

    A spreadsheet is a matrix or grid of mathematical cells. Each cell can contain a number or a formula (expression).

    Cells

    The basic unit of a spreadsheet is a cell. A spreadsheet is a matrix of cells that are in columns and rows. Each cell can:

    • Contain information that is either text or numbers,
    • Contain a reference to another cell or cells, and
    • Contain a formula that may contain a reference to other cells.

    Where the action is. When you open your new empty Excel spreadsheet, you will see near the top of the spreadsheet screen a row that contains a white area with “A1” followed by, in most cases, a blue area ending in fx. After this is an empty white area that extends to the right end of the spreadsheet. Whatever is entered into a spreadsheet cell will appear in this area. By placing the cursor in this area, you can begin entering data in the cell. However, you can also enter your data and formulas directly into cells.

    A cell may be “selected” by placing your pointer over it and clicking.

    Excel has a default setting such that when you press enter, cell selection moves to the cell beneath the cell previously selected.

    Cell addressesabsolute and relative. A formula refers to other cells by an “address,” which is a label that is unique for a combination of column and row. In more complex spreadsheet “workbooks,” the labeling can refer to tabbed-sheet, column, and row. In Excel, the column is specified by a letter from A to Z, then AA to AZ, and so forth until, IA to IV. This allows a total of 256 columns. Rows are specified by numbers from 1 to 65536. For Excel 2007 the highest column label is XFD (16,384 columns) and the highest row number is 1,048,576. A typical address would be “A1” (do not include the quote marks). By default, tabbed sheets are labeled Sheet1, Sheet2, and so forth. A typical address including the sheet label would be “Sheet1!A1.”

    Sheets can be renamed by right clicking on the sheet name and following the menu or by double clicking on the sheet name. Columns and rows cannot be renamed (very advanced users may find other ways of referring to them). When a sheet name has an empty space within it such as “Sheet One” it must be contained in SINGLE quotes when used, such as ‘Sheet One’!A1.

    While referring to a cell on the same sheet, you can omit the sheet label; however, it must be included when referring to a cell from another sheet.

    Because one of the more flexible characteristics of spreadsheets is the copying of formulas, there is a feature that allows a distinction between an absolute cell address and a relative cell address. This distinction involves what happens when you copy a formula in a cell. When a cell address is absolute, the copied formula will contain the same cell address when it is later pasted in another cell. When it contains a relative cell address, it will refer to a cell in the same relative position when it is later pasted in another cell.

    An address of the form A1 is a relative address. It is converted to an absolute address by placing a “$” symbol, sometimes called an anchor, in front of both the column and row addresses to become $A$1. There are also two intermediate or partial absolute cell addresses $A1 and A$1. These refer, respectively, to an absolute column location but a relative row location; and an absolute row but a relative column location.

    All tabbed sheet addresses are absolute addresses.

    This is a relative address. If you copy the entry in C1 to D1 or to C5, the result will be zero (0) because there is nothing in cell B1 or A5.

    This is an absolute address. You can copy it anywhere and it will still return the value in A1.

    This address is absolute with respect to column but not row. You can copy it to D1 and still receive the results from A1; however, if you copy it to C5, it will return zero (0).

    This address is absolute with respect to row, but not column. You can copy it to C5 or any other row in column C and it will return the value from A1, but if you copy it to column D it will return zero (0).

    This is a sheet label in an address. By including the sheet label, you can carry information from one sheet to another. The sheet label is always “absolute,” although the rest of the address still follows the same rules as with the above for cases.

    This is a sheet label with a space in the middle of the label. Because of the space in the label, the label name is contained in single quote marks in the address.

    Use of the equal sign. The equal sign, “=,” is used in a spreadsheet for two purposes.

    The most common use in Excel (and other products) is to distinguish a cell that refers to another cell or contains a formula from a cell that contains information. The equal sign means the cell refers to another cell or contains a formula.

    When the first entry in the cell is an equal mark, the spreadsheet will interpret the cell contents as a formula.

    If the formula is simply a cell address, the spreadsheet will return the content of the cell at the address in the cell containing the cell address.

    If the formula is valid in the language of the spreadsheet, the spreadsheet will return the results of the formula.

    The benefit of a spreadsheet is that the formula can combine mathematical operations with cell addresses, allowing operations to be broken down into tables that are easy to follow or update.

    If the equal sign in a formula is found somewhere besides the beginning, it is a “logical operator,” which is a term used in advanced spreadsheeting. If you are new to spreadsheeting, this is not the time to begin using this function.

    The equal sign can be used along with sheet and cell address in combination to carry information from one tabbed page of a spreadsheet to another. Remember, if the formula is simply a cell address (or long address, including sheet) the returned information is the content of the cell address. Information produced on separate pages can be collected together with a simple equal-sign formula.

    Mathematics in Spreadsheets

    Symbols. Excel and most other spreadsheets recognize typical math symbols. These include the “+,” “−,” and “/” for add, subtract, and divide. Multiplication is symbolized with “*” to avoid confusion with the text “x.”

    Formulas. A basic formula will be thus:

    Formulas can mix numbers and cell addresses, but we strongly advise against this practice.

    Formulas can extend beyond one function and two cell addresses. If you are familiar with algebra, you will know that there are rules for how multiple actions will occur in a chain of mathematical processes, such as an addition followed by a subtraction, or two subtractions in a row. However, if you do not know or prefer a different order, you can take control through the use of parentheses. Parentheses must be used in pairs. The math rule for parentheses is simple: anything inside a pair of parentheses must be processed before anything outside the parentheses is processed. You do not have to worry about doing this. The spreadsheet will do it. What you need to do is used the parentheses to tell the spreadsheet to do it.

    In spreadsheets, negative numbers entered in formulas should be entered inside parentheses.

    Summation. Almost all spreadsheet users use some functions. Here we learn a very basic one. It is called summation, or sum for short.

    The formula for sum looks like this:

    It can also look like this:

    Any part of any cell address can have an absolute symbol. $B$5:$B$10, or $B5:B$10, or anything else. There are many options, and more turn out to be useful than the new user might think.

    The formula has the same effect as

    It is more useful for several reasons, not all of which are discussed here. The most obvious one is that it takes less effort to write, especially as the number of cells gets larger.

    For your later use, we are going to learn that the part inside the parentheses is called an argument. Summation has a minimum of one argument, which, for summation, is a range such as B5:B10. The range of cells can be broken by a comma, such as (B5:B10, B15:B20), in which case there is more than one argument.

    Point and Click

    Spreadsheet development can be made much easier through use of point-and-click technology and the use of automated icons on the spreadsheet toolbar.

    Point and click generally refers to using the mouse, roller ball, or other pointing device to identify some part of spreadsheet during development.

    Two point-and-click features help ease spreadsheet development.

    Pointing to cells. Often when writing a formula, the cell of interest is off the screen or the cell address is long and cumbersome. With spreadsheets, the solution to this problem is to enter the mathematical operator of interest; then, using the pointer, locate the cell of interest and press the “Enter” key or the next formula operator. The spreadsheet will “capture” the cell address of the pointer and enter it into the formula for the user.

    When you have completed the formula, you can edit the formula to add absolute cell address symbols as you might need. Edit the formula by selecting the cell and placing your cursor at the desired location in the formula area at the top of the screen.

    Highlighting cells. Functions and some other features often use a group of adjacent cells. Adjacent cells can be selected in a group by highlighting them. In the graphic, cells B3 through B11 are highlighted. Begin your function by typing the part of the formula from the equal sign to the open parenthesis. The only function we know at the moment is sum, so type

    After typing this, use the pointer to find the first cell in the group to be summed. Click on it, and hold the pointer button down while scrolling to the last cell in the group. At this point release the button and type the closing parenthesis or just press enter.

    Toolbars

    For Users of Office 2000, Office 2003, and Office XP.

    The toolbar is a row of icons (graphic symbols) usually located near the top of the spreadsheet. These icons are shortcuts for many features. Although you will need to learn these features as you progress, we will learn one now, together.

    This icon is the capital Greek letter sigma that looks like this “Σ”. This icon is a shortcut for the summation function. If you select the cell at the bottom of a column of adjacent numbers that are to be summed, then press the icon, you will find that the spreadsheet automatically enters a summation formula stretching to the top of the adjacent cells in the column. If you have effectively organized your data, all you have to do is press enter.

    The function will also work for a row of adjacent cells. When there is a conflict (both row and column) the spreadsheet will select the column.

    As you learn other spreadsheet functions, you can explore toolbar icons that might accelerate your development.

    Quick Access Bar and Ribbon for Users of Excel 2007

    What was found on the toolbar in earlier versions of Excel is likely found on the “ribbon” in Excel 2007. The “ribbon” is an extensive menu of actions that can be carried out with the use of a mixture of icons and other graphic interface devices.

    Different subsets of functions are found with the tabs at the top.

    A vestigial toolbar, , is provided at the very top and is relabeled the “quick access bar.” You can add icon shortcuts to the quick access bar as with older versions of Excel; however, the user is limited to one quick access bar. The bar can be repositioned below the ribbon if the user so prefers.

    The Microsoft Office symbol, , is a button that calls upon a drop-down menu to save or open files, see the most recently used files, print files, and take other common actions as shown in the next graphic.

    Saving your file. To save your file in Office 2000, Office 2003, and Office XP, begin by clicking on the disk icon at the top of your screen. For a previously unsaved file, the result should be the opening of a window that looks like the next graphic

    You can use the drop-down menu and icons at the top of the screen to change the location where the file will be saved. The default location is “My Documents,” which is easy to access from your windows desktop.

    You can use the box near the bottom of the screen to provide a name to your file. The default name is Bookl.xls. Some settings on Windows may hide the .xls part.

    Supply a file name and click the “Save” button.

    Your file name should now appear at the top of the screen.

    You can now close your file by clicking on the “X” as shown in the graphic.

    Next, click on the file-open icon.

    Your file is now in the list. Double click on it, and it opens.

    As you make updates to the file, it is a good idea to periodically click on the file-save button to make sure your saved file is consistent with the file as you have modified it. Although Excel saves a backup copy on a periodic schedule, your best protection against error is a current copy you have saved yourself.

    To save your file in Excel 2007, click the icon in the quick access bar.

    As a result you should receive a screen that looks somewhat like this. (Actual entries depend on your Windows operating system and your actual directory entries. This window reflects Windows Vista.)

    Supply a file name and click save.

    The file is now named and saved.

    You can now close your file by clicking on the “X” as shown in the graphic.

    Click on the Microsoft Office icon, then the open-folder icon.

    Your file is now in the list. Double click on it, and it opens. Note that if your operating system is Windows 2000 or Windows XP, some parts of the open dialog may look like the open dialog in earlier versions of Microsoft Office.

    Although Excel auto-saves files every ten minutes, it may be a good idea to click on the icon after making any important file change.

    Printing your file. Excel defaults to printing all areas that are in use or have been used at some time by the user. It also prints in a generally unformatted, unfriendly manner. To change this default, the user must actively set print areas and organize print layout.

    Returning to Sheet1, we first highlight the area we want to print. Then we click on File and follow the menu to Print Area and Set Print Area.

    Excel 2003 or earlier

    Excel 2007

    In Excel 2007, the procedure, after highlighting, is to select the “Page Layout” tab in the ribbon, and then select the print area graphic on the “Page Layout” tab. From there, select the option to set print area.

    Excel now shows that we have successfully set the print area by showing Print_Area in the range name for the still-selected area and by outlining the print area with a dotted line.

    For older versions of Office, if you do not want to adjust the print layout, you can then click to print the print area.

    If you do want to adjust the layout, for example change the margins or change the layout from portrait (a vertical orientation, higher than it is wide) to landscape (a horizontal orientation, wider than it is high), you need to click on Band find “Page Setup,” which brings up the page setup menus. Various adjustments can be made, such as changing the font scale, forcing the entire output to be no more than one page wide or no more than one page tall (if both are selected, it means only one page), changing print quality, adjusting margins, adding headers and footers, and making other adjustments. When finished, the user can either click OK to save the settings or click Print to save the settings and send the output directly to the printer. Once returned to the main screen, the user can click to send the print area to the printer using the specified settings.

    Generally, most settings will be saved for each tabbed page of the spreadsheet, so the user must click the correct tabbed page (the one you want to print) and gain focus on that page before clicking .

    For Office 2007, we use the Microsoft Office button to call up this drop-down menu, then select print. If you want to continue without modifying options, select Quick Print. Otherwise select Print or Print Preview to select additional options. The page setup menu discussed previously can be accessed through the ribbon on the Print Preview screen.

    The print button in the Print Preview screen will send the job to the printer. The job can also be sent to the printer from the OK button in the print menu, or from the Quick Print selection in the initial drop-down menu.

    Maximize the Usefulness of Your Spreadsheet
    Add Columns or Rows

    An entire column or an entire row can be added between two existing columns by highlighting the column to be moved to the right or the row to be moved down and then holding the “Control” key down and pressing the “+” key.

    Highlight the entire column by clicking on the reference letter, or highlight the entire row by clicking on the reference number.

    Delete Columns or Rows

    An entire column or an entire row can be deleted by highlighting the column or row to be removed and then holding the “Control” key down and pressing the “–” key.

    Freeze Panes

    Freeze Panes is found on the Menu bar under “Windows” in older versions of Excel or on the “View” tab of the Excel 2007 ribbon.

    Place your pointer at the location where you want your row headers and column headers to become permanently visible. In this example, cell B1 is selected. Row 1 and Column A will become permanently visible. Select Freeze Panes as in the previous graphic. In Excel 2007, you can select whether to freeze rows, or columns, or both. In earlier versions, both are selected by default unless you place your cursor in row 1 or column A.

    Afterward, you can scroll down or to the right, but row 1 and column A do not scroll off the screen. This feature is particularly helpful when working with large tables.

    Wrap Text

    When your header is long, you can wrap the text onto several rows in the same cell. On the “Home” tab in the ribbon, select “Wrap Text.”

    The result is demonstrated in this graphic.

    The text might look better if you center it by using the centering icon. The process in older versions of Excel is shown here:

    Select Format Cells from the Format menu

    In the interactive menu, click on the “Alignment” tab and click on the Wrap text box to add a check.

    Finish by clicking on the center text icon.

    Error Messages

    #REF!. The message #REF! usually occurs while you are developing a spreadsheet; it occurs for one of two reasons. First the developer copies a formula with a relative address. When it is pasted, the relative location is outside the matrix of the spreadsheet. The correction for this is to rewrite the formula.

    Second, the developer deletes a column or row in which there is a cell to which a formula refers. The correction for this, provided that the spreadsheet has not been saved, is to undo the delete using the undo icon . If the spreadsheet has been saved, the correction is to rewrite the formula.

    #DIV/0!. The message #DIV/0! arises when there is a division by zero. Division by zero is not valid. Occasionally this message arises with a more complex function such as AVERAGE, where there are no values entered into the column to be averaged. More commonly it arises because there is some division in a formula that is copied and pasted to many cells. Sometimes the divisor is zero, and the error message results. If, for report purposes, this message is undesired, see the solution explained with the IF function in Appendix E.

    #VALUE!. The #VALUE! message arises because the formula refers to a cell that has an illegitimate value for the formula. Typically this mismatch is a text value where the formula calls for a number. The correction is to find the text and replace it with a legitimate value, or to rewrite the formula.

    #NAME?. The #NAME? message arises when an invalid function or range name is entered. The correction is to edit the formula and replace the invalid name with a valid one.

    #NUM!. The #NUM! message arises when a function or formula gives rise to a mathematically invalid result. For example, negative numbers have no square roots, so a function that asks for the square root of a negative number will receive this error message.

    Circular error message. This message is shown by:

    or

    The circular error message arises when a formula refers to the cell that contains it. Excel cannot process such a formula. The formula must be rewritten.

    How to Make an XY Plot in Excel 2003 or Earlier

    These procedures are parallel to those in Chapter 4, which review how to make an XY plot using Excel 2007. These procedures create the same plot using Excel 2003 or earlier. Users of Excel 2007 are referred to Chapter 4.

    Highlight the columns that contain the data. Columns can be found in the sheet labeled “Tbl 4.2” in “Budget Tools Chapter 4 (for student) .xls.”

    The column that will serve as the X-axis label must be the leftmost of the columns highlighted. Start at the labeled header row. You can skip a column or row by holding the “Control” key down and beginning the highlight again at the next column or row you want to include. If the labels are not immediately above the data, you can either enter labels manually or hold the “Control” key down and then highlight the row that contains the labels. Take note, the labels row must be in the same order as the data.

    Next, select the chart wizard icon (see below) to start making the chart.

    Third, on the first menu select XY (scatter); see image below.

    Later you may want to explore various subtypes, but for now, select the default subtype and select next.

    Fourth, the “Step 2” displays two tabs at the top and a small version of your plot in the main screen. You usually simply select next, to accept this screen. For now, first look at the tab labeled “Series” and make sure the labels are satisfactory. If not, click “Cancel” and start over. Be very careful about what you highlight. For your first try, it may be best to use only contiguous rows and columns. When you get to this step again, if the labels are not satisfactory, click on each series in the left-side box and type in the label you want in the “Name” box. Then click Next.

    Fifth, in step three you enter a Chart title and labels for the X and Y axes. You can also select the other three tabs on this screen to improve the format of the chart. We recommend removing the distracting gridlines and moving the legend to the bottom as shown in the following four images.

    Change to:

    and

    Change to:

    Select “Next” once more.

    At Step 4, select a location for the chart. You will have more control if you select “As new sheet” and enter a name of your choosing in the box. Then select “Finish.”

    In this case the resulting chart looks like this:

    Double click on the Y (vertical) axis to bring up this menu (the menu may open in any of the tab pages listed at the top):

    Select Scale and restrict the Minimum to 2000 and Maximum to 9000.

    Select number and change the format to Currency with zero (0) decimal places and click OK.

    Select the x axis and adjust as shown.

    The gray background can be removed by clicking in the gray area and pressing the “Delete” key.

    Your plot should now look like this. You can explore the menus and effects of double clicking to further format the plot.

    Exercises

    Note to students: Later exercises likely will call back to earlier exercises. You may want to copy the files you make to removable media that you can take with you.

    Exercise 1.

    This exercise provides practice in the principles of spreadsheets.

    • Open a blank spreadsheet.
    • Before making any entries, save the empty file using your last name and first initial as the file name. Particularly for students, it is helpful to use your name in file names you submit to other individuals; it is best to start that practice now.
    • Rename Sheet1 as “Practice.”
    • On the renamed sheet, enter the following values: In Cell A1 enter 15, in Cell B1 enter 5.
    • In Cell C1, add cells A1 and B1.
    • In Cell C2, subtract B1 from A1.
    • In Cell C3, multiply A1 by B1.
    • In Cell C4, divide A1 by B1.
    • Click on Sheet2 (for the instructions for the following problems) and enter the following numbers in cells B2 through B8 {15, 78, 23, 60, 42, 12, 31}.
    • Enter the following numbers in cells C2 through C8 {45, 60, 97, 15, 30, 38, 15}.
    • Place your pointer in cell B9 and use the icon to total (sum) cells B2 through B8.
    • Keeping your pointer in cell B9, click the icon to copy your formula.
    • Place your pointer in cell C9 and click the icon to paste your formula.
    • Place your pointer in cell B12 and enter a function/formula that sums all the entries in cells B2 through B8 and all the entries in cells C2 through C8 using two (2) arguments.
    • Print out your work from “Practice” and “Sheet2.”
    Spreadsheet Principles

    The purposes of these principles are to make your spreadsheets easily verifiable and to reduce risk associated with buried or inconsistent information. Another important consideration is to take maximum advantage of the power of spreadsheets.

    Data. Under many circumstances data should be entered in the form of a database. This principle is particularly important when there is a lot of information or when the information is subject to change. Databases have these characteristics:

    • Columns have unique column labels at the top.
    • Columns are adjacent.
    • Rows are adjacent.
    • There is a column that contains a unique identifier for each row; usually this is the first column.
    • Missing data are represented by leaving cells empty.
    • No formulas are entered into data cells.
    • In some circumstances a column is a transformation of one or more other columns, in which case the entire column contains the effect of the same formula.
    • Different sets of data are entered in different databases. In the modern spreadsheet, this can be done by using a different tabbed sheet for each set of data.

    Formulas. Spreadsheet cells can contain data, and they can contain formulas. An error arises when they contain both. There can be several reasons for this error:

    • The data part disappears.
    • The highest functionality of the spreadsheet is to easily change data, which is not possible when the data are hidden inside formulas.
    • Often in spreadsheet development, formulas are copied, in which case hidden “hard coded” data can lead to errors.

    Tables. Tables are where most budget spreadsheeting occurs. Tables bear a lot of similarity to a database. The difference is that some columns or rows contain data and some are calculated. Other rules are largely the same.

    • Tables should not involve gaps between columns to improve appearance, and gaps between rows should be minimized.
    • Under most circumstances, one table should appear on one tabbed sheet.
    • Columns and rows should be clearly labeled with unique labels.
    • No cell should contain both a formula and data.
    • Where data are carried from one tabbed page of the spreadsheet to another, it is best to reveal them, rather than bury them inside a formula.
    • Tables may be used for presentation, so guidelines for formatting provided in Appendix D should be followed.

    Simplification. If you know two ways to accomplish something in a spreadsheet and one of them is simpler, use the simpler one unless you have a very good reason to use the other. For example, we know two ways to add all the cells from cell B5 to B10:

    and

    The simplification principle says to use the first way. Advanced users sometimes have very good reasons not to use the simplification principle. Professors sometimes use more complicated methods so students will see what is going on. New users usually do not have a good reason to avoid the simplification principle.

    Rounding vs. not rounding. Prior to the rise of computerized mathematics, analysts had to decide before calculating at what level of specificity to round. Even if one stops at some very insignificant decimal place, you are rounding somewhere. The typical practice is to round quickly, as the digits beyond the first few have little impact on the magnitude of the final number.

    In the computerized environment, matters are different. It can be more work for both the analyst and the computer to round than to not round. There is rounding, of course, but the rounding occurs at some insignificant location eight to sixty-four decimal places or further after the zero.

    Sometimes, analysts do not like to work with such specific numbers and tend to force rounding near zero or even before zero. This forced rounding is unnecessary.

    The most effective approach is to calculate in all the glorious specificity, but represent the final numbers at a level of confidence that is reasonable for the data analyzed. Typically, for example, a budget in millions of dollars might be confident at the hundred thousand level. Digits below that may be misleadingly specific. When, later, techniques are provided for rounding, this rule should be considered.

    As a general practice when calculations are performed before rounding, or whenever summations may appear awkward with rounding involved—as with percentages—the following disclaimer should appear: “Totals may not sum because of rounding.”

    Power of the spreadsheet. The spreadsheet is not a word processor. Its function is to calculate, not merely to present. Wherever information flows from one location to another, the information should travel through formulas. No calculations should be conducted offline with the results entered into the spreadsheet. Nor should they be copied by retyping. Use the spreadsheet for its power, which is to take your newest bit of information and recalculate your results without intervening manual steps.

    Documenting spreadsheets. Spreadsheets are useful only if they can be interpreted. When a creator first makes one, he or she knows what every cell represents. But soon this knowledge dissipates. Text entries should be included throughout the spreadsheet to explain what cells are doing, what the calculations are intended to accomplish, and where information in cells originated.

    Copy and paste to Word documents. Results produced in spreadsheets are often used in reports. Appendix D provides tips on table formatting. However, one additional consideration is how to incorporate results directly into word processing files. More and more reports are communicated electronically.

    Transmitting portions of reports in Excel files can be problematic for several reasons. First, it may require considerable effort to ensure that the recipient knows how to navigate the file in order to find the report. Second, formatting for the reader requires consideration of what the reader might do to print the file; a report that looks attractive on the screen might print poorly unless considerable additional work is done. Third, possession of the spreadsheet may tempt the report recipient to second-guess the analyst on analysis or assumptions.

    When transmitting the report electronically, it is best to copy specific tables into the appropriate sections of the word processing document. Using the universal copy-paste functions of Windows-based software may lead to additional formatting effort, but it will pay off in the end. The two icons in the graphic shown above and found on the tool bar will greatly speed your use of the copy (left) and paste (right) functions.

    Exercise 2.

    This exercise provides practice in using copy-and-paste techniques.

    • Copy your print areas from Exercise 1 in this appendix: copy them from Excel to Microsoft Word.
    • Save the results in a Word file. Print out the Word file. (Note, no guidance is provided in this text for Microsoft Word.)

    Did you follow the file naming guidelines suggested in Part 1?

    Here we look at formulas and functions that have special application in budgeting. This section does not explain the use of functions in budgeting; it provides practice in how to get the spreadsheet to do them.

    Subtotal. Subtotal is a function that is similar to summation. It has a minimum of two arguments. The FIRST argument is a number from 1 to 11 that tells subtotal what specific function to perform. For budgeting we are interested in the number 9, which tells the spreadsheet to perform summation. The second and subsequent arguments are ranges of cells that contain the data to be summed.

    So, the formula looks like this:

    The specific benefit of the subtotal function is that when two or more instances of subtotal are calculated and one includes the cell that contains the other, the included subtotal is not calculated in the containing subtotal. This is very useful when summing up a long column of numbers where there are intermittent subtotals within, which happens quite often within budgeting. So, if the first five entries are subtotaled, then the next seven, then the next twelve, etcetera, to the bottom of the spreadsheet, using the summation formula, the analyst must then hunt for the specific totals and add them up using: =B11+Bxx+Byy+ ….

    However, using the subtotal function, the analyst simply sums the whole column using the subtotal function rather than the summation function.

    For this function to work, it must be used in both the internal subtotals and at the grand total. Also, the grand total must total all the values contributing to the internal subtotals.

    Percentages. There is a general formula that is widely known to the effect that percentage is found by:

    n is the number of observations or other value of a subgroup, and

    N is the number of observations or other value of the whole group.

    The “times 100” simply moves the decimal point two places to the right, thereby changing the representation of the number from a decimal to a percentage.

    To CALCULATE with a percentage thus produced, the first step is to DIVIDE by 100.

    Spreadsheets have rendered this approach obsolete. This obsolescence is because the visual presentation of a number can differ from the information contained in the cell. For example, the cell can contain the number 0.01, but it can visually present it to the screen and to the printer as 1%. The “%” icon on the toolbar will do this in one easy step.

    This means that for the spreadsheet it is unnecessary to multiply or divide by 100 to achieve a visual presentation, which can, instead, be achieved through data formatting.

    Rounding. Earlier we discussed rounding as compared with not rounding. The recommendation is to not round until calculations are completed. Rounding can be troublesome in some of the same ways that percentages are troublesome. Thus, rounding should be preformed only at the very end of a series of calculations.

    The function for rounding is ROUND, and it has two arguments. The first argument gives the cell address for the number to be rounded. The second argument gives the decimal place for the rounded number. Positive numbers are spaces to the right of the decimal (smaller than one); negative numbers are spaces to the left of the decimal (larger than one).

    An example might be

    The effect of this would be to round the number in B5 to thousands.

    Suppose I have the number 345,678 in cell B5. If I use the function =ROUND(B5,–3), the spreadsheet will not only present, but also calculate with the information 346,000.

    When producing a table that contains rounded information, it is best to round only the final product, the number that the table produces. At the end of the table, first show the exact number produced by the calculation then, on the next row, show the rounded number. Use the rounded number in related text.

    If the table contains several rounded numbers that are summed, these considerations apply. First, all numbers should be rounded to the same decimal level. Second, the spreadsheet should contain a second shadow table that has the same calculations without rounding to determine the effect of rounding. Third, the reported table should contain the disclaimer, “Totals may not sum due to rounding.” Fourth, under no circumstances should such a table be produced if multiplication or division, more complex math, or variances are in the analysis.

    Rebuilding an Interest Rate. Interest rates can be represented two remarkably different ways. For example five percent can be represented by either 0.05 (5%) or it can be represented by 1.05 (105%). The first number represents the change from period one to period two achieved through interest rate growth. The second number represents the whole number transformed into the next whole number.

    There are times, especially when multiplying, that it is necessary to calculate in the whole number rather than just the change value. The simple method of doing this is to enter the number 1 in one cell and the “interest rate” (the change part) in another cell. Then in a third cell add them together. Reference the third cell for subsequent calculations. This leaves the change part of the interest rate available for alternate scenario values.

    Suppose we have 1 in cell A1, 0.05 in cell B1, and the formula =A1+B1 in cell C1. Subsequent calculations refer to cell $C$1 (using absolute cell addresses to avoid copy and paste errors, if necessary). Later we come to believe that the correct interest rate is 3%. We adjust cell B1 and all other relevant cells adjust automatically.

    The exponential. The exponential is a special number in math that tells the computer (either human or electronic) how many times to multiply a number by itself. This information is particularly interesting when working with, for example, interest rates. If you want to know the effect of an interest rate five years in the future, the answer is to multiply the rate by itself five times. Assuming our interest rate is 1.05, as described in the last section, then this would be represented by 1.055 or (1.05)5. Spreadsheets do not have an effective way to process a function merely by changing the font, so instead they use a symbol . Let us assume that we have the value 1.05 in cell C1 as described in the last section. In cell D1 we enter the number 5. In cell E1 we enter the formula

    The effect of this will be to tell the spreadsheet to return the value of:

    If you multiply the result of that formula by some actual amount, it will tell you what the resulting amount should be after five years.

    Net present value. The Net Present Value function is used to calculate the effect of an interest rate on flows of money. It may produce a shortcut to some of the calculations discussed above. The function has two arguments. The first one is an interest rate as expressed in cell B5 in the two previous sections, that is, the change amount, not the whole amount. The second one is the flow of money in equally spaced periods beginning with period ONE (1). A flow of money is a set of amounts such as the amount of income you receive at the end of each year. Take note, if there is an amount associated with period ZERO (before any interest accrues) it should not be included in the flow that is considered in the NPV formula.

    Suppose you have a flow such as {45, 45, 45, 45, …, 45} representing $45,000 a year in income beginning in year one, in cells G1 through G10. You have the interest rate of 0.05 in cell B1. In cell G12 you have the formula

    This formula returns the amount $347.48, which is the value in year zero of this flow of money. Note, this amount is less than the sum of this column. An explanation will occur in the Capital Budget section.

    Sorting Data.

    From the menu bar, select data and follow the menus to “ Sort…” Do not select the from the task bar unless you are sure that you have a simple sort. The menu bar option will take you to the next menu.

    With the older Excel menu, you can select up to three key columns for the sort. For each column you can select “Ascending” (A to Z or 1 to 10) or “Descending” (the reverse). Generally the menu will default to the bullet “My data has” header row, particularly if there is text in the first row and numbers in the other rows. If the first row of data is among the data to be sorted, click instead on the “No header row” radio button. Generally you will not need to click on the options button unless your sort data contain dates, in which case you should follow the menus.

    After you click OK, the data are sorted.

    In Excel 2007 you find sort on the Data tab.

    With Excel 2007 you can add multiple sort levels up to the number of columns you have using the Add Level button. Generally, you will have no reason to change the default Sort On “Values.” The “Options” button allows you to select sort left to right rather than the default top to bottom. The “Order” drop-down menu allows smallest to largest, largest to smallest, or a custom list which may be used for certain calendar entries. Check the header box as appropriate.

    Named ranges. Spreadsheets use named ranges to reduce the difficulty of referring to a cell, group of cells, or table repeatedly. They are particularly useful when a table is produced for use elsewhere in the spreadsheet. The named range is used instead of a cell or range address. For example, if we have named the range G1 through G10 as RangeOne, we can use the following formula

    It will have the same effect as:

    There are some rules:

    Only contiguous areas can be named ranges.

    Range names cannot contain empty spaces.

    Range names cannot be equal to cell addresses (A1 through IV65536; see also larger addresses for Excel 2007).

    Range names cannot be equal to function names.

    There are a few other special words used by spreadsheets that should not be used as range names, but it is usually through misfortune or deliberate risk that gets you there.

    To name a range:

    Highlight the entire area that is to be named.

    Above the top left cell of the visible spreadsheet there will be a white area that contains the cell address of the top left cell of the area you have just highlighted.

    Place your pointer in that area and click.

    Type the name you have selected for the range name and press enter.

    The range name should now appear where the cell address previously was at the beginning of step 2.

    Beside the area you have just typed in there is a small drop-down arrow, if you place your pointer on it and click, you should see a list with the range name you have just created.

    Vertical lookup. Vertical lookup is a database function. It allows the user to use one bit of information to find another. Perhaps you know a person's name and have a sorted list of names and salaries. Vertical lookup allows you to use the name to find the income. This is a very useful function for skilled spreadsheet users. Vertical lookup has three arguments. The first one is the target information, such as the name of the employee. The second one is the range in which the database can be found—typically a named range. The third one is the number of columns, beginning with the first column as 1, from the key column (which must be the first column) to the column that contains the information. By definition a database should be wider than one column.

    Suppose you have a salary database in a range named Salaries. In cell D5 you have the name of a person whose salary you want to use in a calculation (spelled exactly as in the database). In the database the first column is the name. In the second column is the job classification. In the third column is the salary information by year. In the fourth column is the salary information. For your purposes, it is the salary information that you want. There could be other information in other columns. In another cell you enter the formula using the function as follows:

    This function will work properly only if the database is sorted by the names column in the database and only if the names are spelled the same in both locations.

    Ranking data. The Excel ranking function is not exactly equal to the statistical ranking function, but it is sufficient for ordinary communication. It has three arguments. The first argument is the cell address of the number you want ranked. The second argument is the range of cells that contain the other numbers to which this number is to be compared. The third argument is either 0 or 1. If it is 0, the smallest number will have the largest number rank, if it is 1, the smallest number will have the smallest number rank. So an example might be:

    Suppose {30;18;29;56;43;90;12;56;45;31} are in cells F1 through F10. The example formula returns the resulting rank of 4.

    Average. To calculate a statistical mean, Excel uses a function labeled “Average.” Assuming that you have some data in cells B5 to B10 for which you want an average, the formula looks like this:

    This will have the same effect as:

    where N is the count of the cells from B5 to B10 that contain an entry.

    You can determine N with a function “Count,” so you can also determine the average with:

    However, this violates the simplicity principle discussed in Section 2, so should not ordinarily be used. It is shown here to explain what the Average function does.

    If you want an average of two non-contiguous ranges, you can include two arguments in your formula. Suppose you want the average of B5 through B10 and B12 through B17, but not B11. The formula would be:

    The effect is shown in the graphic.

    Standard deviation. Standard Deviation is a very common statistic. When used without any other reference it is the standard deviation of the mean (average). There are two different concepts known as the sample standard deviation and the population standard deviation. Under almost all circumstances where their values are sharply different the sample standard deviation is likely the right one to use. When in doubt, use sample standard deviation. Excel will produce either. Following is the function for sample standard deviation. Assuming that we have the same data we have just used to calculate the mean. We would also like to know the standard deviation. The function is:

    Any common statistics book will explain the math underlying this procedure.

    Dates

    Budget spreadsheets commonly contain date information. In the “bad old days” it was somewhat difficult to enter dates. Today, you can enter a date in standard American format, such as “7/05/07.” The spreadsheet interprets any valid date in the two slash format as a date. You can also enter it as “7-25-07” or July 25, 2007. The graphic shows the effect of entering the date in these three formats, respectively. All three of these are valid dates in Excel.

    Exercises. Use the spreadsheet called “Budget Tools Appendix C Exercises.” The tabbed sheet is labeled “Part 3.” The problems described here should be completed there with the data that can be found there.

    Exercise 1.

    In the yellow highlighted cells, compute the subtotals for the component tables, and in the green highlighted cell compute the total for the entire column.

    Exercise 2.

    Use the appropriate spreadsheet method to demonstrate the numbers as percentages.

    In the yellow highlighted cell show the percentage for 45 / 50.

    Exercise 3.

    In the first column there are numbers to be rounded to the degree shown in the second column. Put the results in the yellow highlighted area in the third column.

    Exercise 4.

    • In the highlighted cell, show the first year of interest on $250 at 5 percent.
    • In the highlighted cells show the cumulative value for each year from year 1 to year 10 on $500 compounded at 3 percent per year. What is the final total value of the deposit?

    Exercise 5.

    • Sort the information in the table by name.
    • Name the table that contains the data, but not the headers, “Ages.”
    • Use Vlookup to find Henderson's age.
    • Use Vlookup to find Wilson's gender.

    Exercise 6.

    • Find the rank for all cells with the largest value having the rank of 1, place in the highlighted cells.
    • Find the average of the data; place in the highlighted cell.
    • Find the standard deviation of the data; place in the highlighted cell.

    Exercise 7.

    Enter today's date in the highlighted cell. Use a traditional numeric entry such as ##/##/## (for example, 07/27/07).

    Enter today's date in the highlighted cell. Use a text form such as Month ##, 20## (for example, July 27, 2007).

    Appendix D: How to Produce Clear, Attractive Reports

    After students learn how to use a spreadsheet, students should turn their attention to making that spreadsheet readable and attractive. Students will discover that, in the workplace, the readability and attractiveness of their work is as important as the work itself.

    How to Use Labels

    In a spreadsheet, labels are in the top row of the data and in the left column. The labels explain what is the row or column. It is extremely important that labels are chosen with care so that the reader immediately understands what is in that column. For example, “wmyrsed” is not very clear while “women mayors yrs. of ed.” is quite clear, and the latter label can fit in the top two rows of a column. Your supervisor does not wish to spend time finding out what each label means. The label should be intuitive. This is not the case in SPSS, in which there is a limit of eight characters for labels. Spreadsheets have no limit to the number of characters in a label. We are limited only by our common sense.

    How to Use Bold Text

    Bold your labels in the top row and center the labels. Compare these two spreadsheets; Table 1 is unformatted; Table 2 has formatting for the column heads.

    APPENDIX D, TABLE 1: Sample Table without Formatting
    APPENDIX D, TABLE 2: Sample Table with Formatting

    Already the spreadsheet on types of hospital payments looks far better because we bolded the labels of the columns. How much better does this chart look if you take the title out of the spreadsheet and place it above the table, in bold? See Table 3.

    APPENDIX D, TABLE 3: Sample Table with Title in Prominent Position

    Now the spreadsheet really sings. It is simple, clear, and attractive to the reader. Another method to make your spreadsheet highly polished is to set off the header rows with rules. See the same spreadsheet with rules in the header rows (Table 4).

    APPENDIX D, TABLE 4: Sample Table with Rule Outlining the Header Rows
    How to Use Indentations and Careful Categorization

    Often a fiscal analyst is asked to create a budget with revenues, expenditures, and net. This can be done accurately yet be poorly formatted, or it can be accomplished with a great deal of finesse that requires little time, only sophistication.

    Table 5 is readable, but because of the way it is formatted, it is not very informative.

    APPENDIX D, TABLE 5: Sample Table without Formatting and with Embedded Title

    Table 6 is far more readable and easy to understand, with lots of clear information.

    APPENDIX D, TABLE 6: Sample Table with Formatting, Prominent Title, and Indents

    What is so special about the second spreadsheet is the bolding of labels, the careful creation of subcategories that are helpful to the reader, and the indentation used for categories. All of these details make for a far more readable spreadsheet.

    How to Use Rules

    Rules delineating specific cells in spreadsheets can indicate that some columns belong together and others do not belong to them. In Table 7, program services are separated from support services, which has two categories, fund-raising and administration. Now the reader understands that support services comprises both fund-raising and administration.

    APPENDIX D, TABLE 7: Sample Table Showing Rules That Span More than One Column
    How to Build Assumptions into a Spreadsheet

    Creating proposed budgets can be complicated. It may remain complicated, but it is essential that the proposed budget be clearly laid out in a way that the reader understands the assumptions made in the creation of the proposed budget. In Table 8 the assumptions are situated in a format that can be used to create formulas and to be easily presented to the reader.

    APPENDIX D, TABLE 8: Sample Table with Formatting and Assumptions Following the Table

    Often budgets have assumptions built into them. For example, salaries may include 3 percent of the salary for health insurance and 20 percent of the salary for the pension fund, so that the true cost of a position is higher than just the salary. The true costs include the fringe benefits. In Table 8, there are several assumptions, which are listed in the assumptions following the spreadsheet.

    In addition, the spreadsheet is designed in such a way that the readers understand how much each position costs before the total budget is calculated. This is very useful for any analysis that might be needed about the proposed budget.

    Table 8 shows one way to present this information. However, a more useful way to organize the data is to place the assumptions in the spreadsheet (see Table 9). This has two advantages. The first is that the reader can easily see the effect of the assumptions, and the second is that the fiscal analyst can use the numbers to create formulas for the spreadsheet. As much as possible, assumptions should be clearly stated in the spreadsheet.

    APPENDIX D, TABLE 9: Sample Table with Formatting and Assumptions Embedded in the Table
    Exercise

    Below are data that need to be placed in a spreadsheet. The challenge is to create a proposed budget for the public library of Smithtown. Here is all the information you need:

    Proposed budget:

    • 5 percent increase in salaries except for clerical staff, who get a 6.5 percent increase
    • Assume fringe benefits are the same as in the Fire Department's proposed budget
    • One chief librarian at a salary of $75,893
    • Two librarian positions at Level II at a salary of $61,234 each
    • Four librarian positions at Level I at a salary of $54,469 each
    • One management information system position at a salary of $71,089
    • Five clericals, three of whom are budgeted for half time at a full-time salary of $31,789, and the other two are full time.

    Create the spreadsheet and answer the following questions:

    • What is the salary plus fringe benefits for one Level II librarian?
    • What is the total number of positions, not people?
    • How much is the library spending on health insurance for a Level I librarian in the proposed budget?
    • What is the total proposed budget for the library?

    Appendix E: Advanced Spreadsheeting for Budgeting

    These spreadsheet skills are for the user who has attained the skills in Appendix C. They are more advanced skills that are used for more complex spreadsheet purposes. All the skills demonstrated here are used in spreadsheets supporting this text.

    Logic Functions

    Logic functions are functions that test a cell for a truth condition and either return a truth value, or perform another operation depending on the truth value. The most common logic functions are “=,” “>,” and “<” (equal, greater than, and less than).

    The simplest use of a logic function is to test a cell for its truth value. Suppose you have the value 35 in cell C1 and a value in Cell E1. In Cell D1 you enter the formula =C1=E1. This is equivalent to the question: Is the value in E1 equal to the value in C1? In the first graphic, the result is “TRUE.” In the second graphic, it is “FALSE.”

    Sometimes values past the decimal are hidden on purpose, or there is a long column of numbers to be checked, so simply returning the value true or false is all that the user needs.

    However, other times, the user wants to condition some calculation on the truth value. There are two ways to condition calculations on the results of this logical test. The first (less complex, but involving more skill) rests on the partial equivalence of “TRUE” with “1” and “FALSE” with “0.” Thus, the formula =(C1=E1)*1 will equal 1 if C1 is equal to E1 and will equal 0 otherwise. Use of this binary value can frequently be used to produce conditional formulas because the formula takes a value when multiplied by 1 but fails to take a value when multiplied by zero.

    The other, somewhat more elegant, but also more complex, method of conditional formulas uses the IF formulation, which has three arguments.

    The first argument is the logic query, the second argument is the formula to use in the case that the logic query returns a value of true, and the third argument is the formula to use in the case that the logic query returns a value of false. The given example has the same effect as the formula given in the previous graphic.

    However, with the IF formulation, the conditional formulas can be entered directly into the second and third arguments.

    In the example in this graphic, if the conditional returns “true” C1 and E1 are added, if it returns “false,” as it does, E1 is subtracted from C1.

    The IF function can be used to prevent #DIV/0! errors as shown in this graphic.

    AND Function and OR Function

    The “AND” and the “OR” functions concatenate multiple logic statements. The AND function will return “FALSE” unless all of the individual statements are true. The OR statement will return “TRUE” if at least one of the individual statements is true. In the graphic,

    returns “FALSE” while

    returns “TRUE.”

    In both cases cell A1 is empty and not equal to D1, while B1 and C1 both contain the value 3.

    The AND and OR statements can be embedded in the IF statement to produce more advanced conditional statements. Here, when true, B1 is added to D1; when false, B1 is multipled by D1.

    returns 9, which is 3*3, while

    returns 6, which is 3+3.

    CELL Function

    The CELL function looks up characteristics of the designated spreadsheet cell. In the example formula, it looks up the cell address. This function has two arguments. The first is the characteristic desired (for other possible characteristics, refer to the Excel help screen). The second is the desired cell. Regardless of whether the address in the formula is relative or absolute, the returned address will be displayed as absolute.

    There are times when the spreadsheet developer wants to put a reference cell address in another cell. Because it is relatively easy to add or delete rows or columns, this fixed reference to a cell address can be unreliable. By using this function, the explicitly listed cell address will keep up with the actual cell address.

    The INDIRECT function treats the cell address in the referenced cell as if it were entered in the formula at the location of the indirect function. The function takes one argument, the cell that contains the indirect cell address.

    Suppose you have a number in cell E2. For whatever reason, reasons usually involve spreadsheet complexities, you do not want to insert E2 in your formula. You enter E2 or = CELL(“address”, E2) in cell C2; in cell D2 you enter:

    The indirect formula returns the same result as if you had entered =E2 in cell D2. Sometimes, this function can produce a relatively easy way to make a table referring to non-contiguous source data. There are also other uses of this function for more sophisticated spreadsheeting.

    The ABS function takes one argument. It returns the absolute value of the number in the referenced cell. Suppose you have negative three in C2. In cell D2 you insert the function

    The result is the absolute value, which is positive three.

    Maximum and Minimum

    The MIN and MAX functions test a range of data to find the minimum or maximum values. In the graphic

    returns 3, the lowest value, while

    returns 9, the highest value.

    Square Root

    Square root is used in finding RMSE, which is explained in the glossary. It is also a common mathematical function that finds the number that, when multiplied by itself, results in the queried number. In the graphic,

    returns 9, which is the number that, when multiplied by itself, results in 81 (the queried number).

    Parsing Dates

    Sometimes the user wants only a part of a date, the day, month, or year. Excel has functions that promptly find this information.

    The function

    returns 25, the day of the month in the example. The function

    returns 7, the month in the example. The function

    returns 2007, the year in the example. The alternative format is shown to demonstrate that Excel is interpreting the underlying information, not the format.

    Truncating Data

    On rare occasions it is appropriate to delete the portion of a number that follows a decimal rather than to round it. For example, one may have a contract where one pays only for full hours of service, so partial hours are not paid at all rather than paid at a scaled rate. After determining the precise number of hours, one may still need to determine the number of paid hours. WARNING: Deleting decimal values is not equivalent to rounding and should never be confused with rounding.

    The function for deleting the portion after the decimal is TRUNC. In the graphic

    returns 3. If this number had been rounded, it would be 4.

    Developing Excel Macros

    Developing Excel macros should be attempted only by experienced Excel users. This guidance is for the user who already knows the basics of macros but may want to apply that knowledge to forecast fitting. To learn basic macro skills, the user should consult Microsoft or other instruction material. The guidance will be useful for the student who already knows how to call up the macro writing screens within Excel (Microsoft Visual Basic, within Excel).

    Typically macro functionality requires development of both spreadsheet elements and macro elements. In the spreadsheet, “Budget Tools Appendix E macro.xls” (Office 2003 and earlier) or “Budget Tools Appendix E macro.xlsm” (Office 2007), there is only one tab, “Table 5 for Macro.”1 Macros make sense only in the context of extending some functionality. This spreadsheet extends the forecast functionality from Chapter 4. The rest of this section follows on from the material in Chapter 4. First, the spreadsheet itself must be further developed. The additional development is shown in columns N through R in rows 1 through 6, columns N through R in rows 7 through 149, and in cells F4 and G4. Also, the initialization formulas in F6 and G6 have been slightly modified for the special case where β is zero.

    This is what the changes do: The changes in rows 7 through 149 in columns N and O provide an extended grid (pairs of values α and β) to search. Column P in the same rows is reserved for output (RMSE for each of 143 different pairs) from the macro. Columns Q and R in these rows and the formulas in cells P6, Q6, and R6 are used to identify the α and β values associated with the smallest RMSE generated by the 143 pairs.

    The entries in cells N1 and O1 are used by the macro to sequentially search through all 143 pairs, they are copied by formula into F4 and G4. Cell P1 is copied by the macro sequentially into the column P7 through P149 as N1 and O1 update. This is the critical grid search. The entry in P1 is the RMSE value also found in cell H3. Cell N2 is the switch that is set to 1 or 0. Cell N2 is also the named range “Signal.” When it is set to 1, the table reads the values in N1 and O1 into the cells F4 and G4 to generate the forecast; when it is set to 0 (after the optimal RMSE has been identified), it reads the values from N6 and O6 into the table to generate the forecast.

    To use this table along with the original table to generate a grid search and optimal forecast, use the following macro, the execution of which is explained in Excel Help menus. (As stated at the beginning of this discussion, macros should be attempted only by experienced users.)

    MACRO (note, this line is not part of the macro)

    Sub FitMac()

    Let Range (“Signal”) = 0

    Sheets(“Table 5 for Macro”).Select

    Range(“p7:p149”).ClearContents

    Range(“N1”).Select

    For a = 6 To 148

    For b = 0 To 1

    Let ActiveCell.Offset(0, b) = ActiveCell.Offset(a, b)

    Let ActiveCell.Offset(a, 2) = ActiveCell.Offset(0, 2)

    Next b

    Next a

    Let Range (“Signal”) = 1

    End Sub

    END OF MACRO (note, this line is not part of the macro)

    This macro is included in the spreadsheet. As the first line of the macro says, the macro is labeled “FitMac.” The experienced user will know how to invoke it, so standard invocation guidance is not provided; however, if the spreadsheet has macro permission, the user can invoke this macro with key combination control-p. The second line turns the switch (in the named range, “Signal”) to zero. The effect of this is to tell the spreadsheet to look to N1 and O1 for α and β. The third and fifth lines locate the cursor at a definite location. The line between these empties out cells P7 through P149 of any prior content (this step is not strictly required, but it makes the effect of the macro more visible to the user). The next two lines begin an outer and an inner loop. The next line (the inner loop) copies α and β values from the column N7:O149 in pairs to N1 and O1. The next line (the outer loop) copies the resulting RMSE from P1 to P7:P149 beside the correct pair of α and β values. The next two lines close the inner and outer loops. The next line turns the switch to 1, which has the effect of using the best RMSE to select the associated α and β values using the logic-math formulas in columns Q and R. The last line ends the macro.

    Exercises

    Use the spreadsheet named “Budget Tools Appendix E Exercises” and the tabbed sheet “Part 5.”

    Exercise 1.

    In the highlighted cell, test the two values above for equality.

    Exercise 2.

    In the highlighted cell, write a formula that divides cell A5 by cell A4 if they are not equal and multiplies them if they are equal.

    Exercise 3.

    • If cell A5 is equal to cell A4 and if cell A5 is greater than cell A4, add them together, otherwise return the value zero.
    • If cell A5 is either greater than or less than cell A4, subtract cell A5 from cell A4, otherwise add them together.

    Exercise 4.

    In the highlighted cell, write a formula that finds the cell address for the value 25 in problem 1.

    Exercise 5.

    In the highlighted cell, write a formula that uses the highlighted cell in Exercise 4 to display the value of the referenced cell.

    Exercise 6.

    In the highlighted cells show the absolute value for each number in the column.

    Exercise 7.

    In the highlighted cells show the minimum and maximum values for the original and absolute value columns of problem 6.

    Exercise 8.

    • Using the highlighted cells, find the square root for each of the associated values in the column.
    • What is the square root of −1? What is the meaning of this result?

    Exercise 9.

    Using the highlighted cells, extract the day, month, and year out of the given date.

    Exercise 10.

    For each of the numbers shown, first truncate each number in the labeled column, then round it to zero decimal places in the labeled column. Why are some cells in agreement and some not?

    Note

    1. Depending on the version of Office software, various steps must be taken to enable the use of macros. These steps may involve changing the Excel security level or simply clicking an “OK” button to allow use of a macro.

    Glossary

    • Absolute percent error. Absolute value of the forecast value minus observed value all divided by observed value.
    • Activity measures. Measure of the activities of the organization; for example, the hours of work performed by employees of a certain class.
    • Actuals. Expenditures.
    • Allocative stage. Stage in the budget process when the budget is changed to reflect the legislative agreement.
    • Allotment. Specific amount of money that has been provided for a particular agency for a certain period of time.
    • Annual data. One observation that accounts for all the data for a whole year.
    • Annualization. Process by which a dollar amount, such as salaries, is spread out over the entire fiscal year rather than only a portion of the fiscal year.
    • Assets. Anything a person or company owns—such as money or property—that is of value.
    • Assumptions. Those factors or circumstances taken for granted as true without proof and that underline an argument or problem.
    • Average costs. Total costs of a product or program; these total costs are divided by the total number of units involved.
    • Balance sheet. Statement of total assets and liabilities at one point in time for a governmental entity or corporation.
    • Base case. The most likely case upon which the cost-benefit analysis is initially conducted.
    • Benchmarks. Adoption of the achievement level of other similarly situated organizations.
    • Breakeven. Analysis required to determine the exact volume when the revenue equals the cost of the activities.
    • Budget implementation. Monies are spent according to the budget during a certain time; this usually refers to the fiscal year of the governmental agency or organization.
    • Calendarization. Process by which we break up expenses or revenues into monthly amounts; a cash flow spreadsheet laying out the fiscal year by the month is a calendarization of the revenues and expenditures.
    • Capital assets. Something that has a useful life or a defined period of time of more than one year and is usually obtained using borrowed dollars; most organizations set a minimum life expectancy and minimum value on an item before classifying a good or group of goods a capital good.
    • Capital budget. Document that indicates the proposed long-term investments in buildings and equipment.
    • Capital fund. Sum of money established to finance the acquisition of fixed assets, such as buildings or equipment.
    • Capital improvement plan (CIP). A planning document for a defined period of time that indicates the time, number, volume, and dollar amount of all acquisition and construction for an organization or government.
    • Capital spending. All charges used to support the capital budget, which supports construction and other expensive major long-term acquisitions.
    • Chart of accounts. Listing of categories of all transactions in a financial system with a numeric-alpha system to track the categories.
    • Classification and coding. Chart of account classifies every transaction into categories and codes these categories for easy reference.
    • Compounding. Process of earning interest on interest; compounding can be used to determine the future value of a current amount, in which the current interest is added to the principal in computing interest for the next period.
    • Consumer price index (CPI). Measure of the change in the prices of consumer goods and services over time.
    • Cost. Resources used to produce a good or service, often in monetary terms.
    • Cost-benefit analysis (CBA). Economic evaluation in which both the costs and the benefit are expressed in monetary units.
    • Cross-sectional data. Data that are a series of observations about a particular data set that varies according to a variable at the same point in time.
    • Debt service fund. Sum of money in which money is set aside to pay for a debt.
    • Discount rate. Interest rate used to calculate the present values of future amounts.
    • Discretion. Ability of the public employee to make substantial decisions about the activities of government to achieve the democratically defined outcomes.
    • Early stage outcome. Outcome that occurs during or soon after the organization has performed its service.
    • Efficiency measure. Ratio of the output to the input.
    • Equity. fairness, particularly with respect to the distribution of public services.
    • Expenditure codes. Five-digit codes that are arranged by functional unit and object of expenditure or expense.
    • Expense codes. See expenditure codes.
    • Exponential smoothing. Process for finding the local mean value of a time series based on a weighted average between the older part of the series and the most recent observation, where the weights are the opposite shares of one or more parameters; the mean found by this method is used as a forecast.
    • Fiduciary funds. Funds that track transactions in which government acts as a trustee, such as collecting and tracking workmen's compensation or Social Security payroll taxes and then depositing those funds with the state.
    • Financial efficiency. Efficiency that can be measured in monetary units.
    • Financial plan. Planning document that indicates how an organization is going to use its resources over a specific period of time; the plan includes projected budgets usually over a period of four or five years.
    • Financial statement. Report stating the financial condition of a government or corporation in separate statements.
    • Fixed cost. Cost that remains constant over a given period of time or a certain volume of activity.
    • Forecast. Estimated value for a period, based on information from previous periods.
    • Fund accounting. System of accounting in which assets and liabilities are grouped according to the purpose and restriction for which they are to be used.
    • Fund source. Specific part of an appropriation account that has been set aside to fund a particular work to be performed.
    • General fund. In governmental accounting, a sum of money set aside for day-to-day expenses, and not for special purposes.
    • General ledger codes. Three-digit codes arranged in balance sheet order; assets, followed by liabilities and fund equity.
    • Governmental funds. Usually refers to any fund operated by any level of government; these funds use the modified accrual accounting method.
    • Income statement. Statement of all revenues and expenditures over a defined period of time.
    • Index. Numerical scale of change in a particular subject area; a common index is a stock index.
    • Indicator. Indirect measure of what is to be calculated.
    • Indicators. Indirect measures of what you want to measure.
    • Inflation. Increase in general prices over time that reduces consumer purchasing power.
    • Input. Amount used to invest in a given activity.
    • Interim outcome indicator. Indicator, sometimes called a proxy, that measures something instead of the desired outcome as a substitute until the outcome can be more directly observed.
    • Level. Frequency at which data are recorded: weekly, monthly, quarterly, annually, and so forth.
    • Liability. Anything that is owed to someone else; in government or nonprofit accounting, liability refers to all the debts that organization owes.
    • Life cycle costing (LCC). Analytical method used to determine the total cost of ownership, including acquisition, operation, maintenance, and disposal.
    • Line item. Itemized account or appropriation usually in a budget or appropriation bill.
    • Line item veto. The veto of one item from a list of items in an appropriation; some governors and mayors possess this power by law.
    • Loss function. Summary value that measures the effectiveness of the forecast.
    • Marginal cost. Cost incurred by government or nonprofit organizations that results from producing one more unit or from adding one more client to an existing program.
    • Mean error (ME). A bias measure, the sum of the errors divided by the count of the errors.
    • Mean square error (MSE). A statistic that quantifies the accuracy of a forecasting procedure; it begins with the forecast minus the observed value for each observation of a forecast, these values are squared, summed, and then the average is found; the resulting value is the mean square error of the forecast.
    • Measures. In budgeting, measures are variables that can be used in analysis.
    • Measures of process. Measures of way the organization uses inputs to achieve outputs and outcomes.
    • Measures of satisfaction. Measures of the response of a target audience to the activities, outputs, or outcomes of an organization; generally measured through surveys.
    • Midpoint forecast. Forecast of the middle of the range of likely values; actual value will involve some random variation (this part cannot be predicted).
    • Net assets. The total assets minus all the liabilities that an organization owes.
    • Net present value (NPV). Difference between the present values of the benefits and the present value of the costs often associated with a long-term project.
    • Nonpersonnel services (NPS). Budget items not included in personnel services; common categories of NPS are supplies, equipment, travel, construction, consultants.
    • Nonseasonal data. Data that do not exhibit a repeating pattern each year.
    • Object code. Classification to identify the type of purchase, or service, or any other charge received in a financial system.
    • Objects of expenditure. Expenditures can be classified by agreed upon codes in different types of categories, such as travel costs, contractual services, and so forth.
    • Operating expenditures (funds). All charges used to support day-to-day operations of an agency and that are expenditures other than capital expenditures.
    • Other than personnel services (OTPS). See nonpersonnel services.
    • Out year. The year beyond the current fiscal year. Often used in plural in a discussion of forecasting data several years into the future.
    • Outcome. Long-term achievement of what was originally established that the agency would achieve.
    • Outlier. Data that are extreme in respect to other values.
    • Output. Immediate short-term measure of an activity.
    • Pay as you go. Financing capital assets through fees, taxes, or other operating revenues (also known as pay go).
    • Percentage change. Units after a time period minus units before the time period divided by units before the time period (represented as times 100); definition should include the formula (X2X1) / X1 * 100.
    • Percentage of total. Units (such as dollars) for a part (such as part of a budget) divided by units for the whole and represented as times 100; definition should include the formula (Xi) / X * 100.
    • Performance measurement. General term referring to various ways of observing the performance of an organization.
    • Permanently restricted funds. Funds that usually cannot be spent although the interest earned on the funds may be used.
    • Personnel services (PS). Those budget items relating to salaried employees, including salaries and fringe benefits.
    • Post hoc accountability. Reporting achievement to a policy-making body rather than following detailed procedures issued by that body.
    • Primary data. Data collected directly through an instrument or observation, typically for use in analysis by those who collect the data.
    • Process efficiency. Ratio output/input with respect to any process.
    • Process measures. See measures of process.
    • Program code. Number related to the accounting system that defines where the transactions concerning a particular program are categorized.
    • Proprietary funds. Funds in a governmental unit that are set up to account for activities that are similar to those in the private sector; proprietary funds are self-supporting in that the services are financed through user charges; proprietary funds use the economic resources measurement focus and the accrual basis of accounting.
    • Ratio analysis. Study, using ratios, of the financial condition of a government entity or nonprofit.
    • Ratios. Relative relationship that is calculated between two numbers.
    • Reference groups. Organizations, governments, or other entities that are similarly situated (compared with your organization) with respect to the information you are analyzing or communicating.
    • Revenue codes. Four-digit codes that are arranged by source.
    • Revenues. Broad general term referring to any money coming to a government or other entity for its own expenditures, in other words, all the money received by a governmental unit or nonprofit group; funds can come from many different places and be used for a variety of purposes.
    • Root mean square error. A loss function; the square root of the sum of the squared errors divided by the count of the squared errors.
    • Secondary data. Data collected by someone else, at an earlier time or under a different circumstance and made available through a data set or database.
    • Sensitivity analysis. Process of varying parameters in a given model to assess the level of change in its outcome.
    • Simulation. Method of forecasting in which the forecaster can forecast all the elements of an entire system.
    • Special revenue fund. Funds in governmental accounting set aside for specific purposes, unlike a general fund.
    • Standard. Setting targets based on guidelines established by authoritative bodies.
    • Step-fixed costs. Costs that remain constant within a given range of activities and length of time but that can change stepwise beyond these critical points.
    • Target. Proposed and budgeted level of performance.
    • Temporarily restricted funds. Funds that are restricted in use and cannot yet be spent.
    • Time series. Data that periodically gain a new observation from the same process.
    • Time value of money (TVM). Principle that money received in the present is worth more than the same amount received in the future; TVM is the basis for discounted cash-flow calculations.
    • Trend. Tendency for the series to increase or decrease from period to period.
    • Unrestricted funds. See operating expenditures.
    • Vacancy rate. Calculation of the percentage of positions that are vacant.
    • Variable costs. Costs that vary with volume.
    • Variance. Difference between what is budgeted and what is actually spent.
    • Variance analysis. Process of comparing a budget that has already been created with the actual expenditures.

    About the Authors

    Greg G. Chen is an associate professor at Baruch College, City University of New York; he has had broad working experience in China, Canada, and the United States. Professor Chen was the budget manager for the Ministry of Finance and the premier's office of British Columbia, Canada, and senior policy adviser and senior researcher in various government agencies and public authorities in British Columbia before taking his professorship in the United States. Professor Chen conducts research and publishes papers on the subjects of public and corporate finance, budgeting and cost-benefit analysis, traffic safety and injury prevention, school safety and student achievement, comparative health care systems, research methodology, and program evaluation.

    Dall W. Forsythe, a professor of practice at the Wagner School at New York University, has extensive management experience in the government, private, and nonprofit sectors. In government, he served as budget director for the State of New York and for the New York City Board of Education. In the private sector, he worked as a managing director in the public finance department of Lehman Brothers. In the nonprofit sector, Forsythe served as chief administrative officer of the Episcopal Diocese of New York for four years. He is author of Memos to the Governor: An Introduction to State Budgeting (2004), and in 1998 he received the S. Kenneth Howard Award, a career achievement award from the Association for Budgeting and Financial Management.

    Lynne A. Weikart is associate professor at Baruch College, City University of New York. Before her academic career, she held several high-level government positions, including budget director of the Division of Special Education in New York City's public schools and executive deputy commissioner of the New York State Division of Human Rights. For several years, she also served as the executive director of City Project, a progressive fiscal think tank that focused on reforming New York City's resource allocation patterns. Her current research focuses on resource allocation in urban areas as well as the budgeting process and gender issues, and she has published several articles about these subjects. Professor Weikart won the Luther Gulick Award for Outstanding Academic, New York Metropolitan Chapter of the American Society for Public Administration, in 2001. Her book, Follow the Money: Who Controls Urban Mayors? A Case Study of Decision Making during Fiscal Crises, is in press.

    Daniel W. Williams is associate professor at Baruch College, City University of New York. Before joining the faculty at Baruch, he held several high-level positions in government, including the budget directorship of the Virginia Department of Medical Assistance Services. His research focuses on the history of performance measurement. His articles have appeared in Administrative Theory & Praxis, Public Administration Review, and Administration & Society. He also conducts research into technical forecasting.


    • Loading...
Back to Top

Copy and paste the following HTML into your website