QGEN questionnaire cleaning example

This example demonstrates how QGEN might be used to clean the data from a typical marketing research product awareness, trial and usage survey.

                             A SAMPLE QUESTIONNAIRE
=============================================================================
                     1A     1B     1C     1D     1E     1F     1G     1H
                    -----  -----  -----  -----  -----  -----  -----  -----
                           OTHER                       BRANDS
                    FIRST   UN-                        BOUGHT  PRI-  BRAND
                     MEN-  AIDED  AIDED  EVER   PAST 6 REGU-   MARY  BOUGHT
                    TION   AWARE  AWARE BOUGHT  MONTHS LARLY   BRAND LAST
                    -----  -----  ----- ------  ------ ------ ------ ------

A. Brand A ........  1 (20) 1 (22) 1 (24) 1 (26) 1 (28) 1 (30) 1 (32) 1 (34)
B. Brand B ........  2      2      2      2      2      2      2      2
C. Brand C ........  3      3      3      3      3      3      3      3
D. Brand D ........  4      4      4      4      4      4      4      4
E. Brand E ........  5      5      5      5      5      5      5      5
F. Brand F ........  6      6      6      6      6      6      6      6
G. Brand G ........  7      7      7      7      7      7      7      7
H. Brand H ........  8      8      8      8      8      8      8      8
I. Brand I ........  9      9      9      9      9      9      9      9
J. Brand J ........  0      0      0      0      0      0      0      0

K. Brand K ........  1 (21) 1 (23) 1 (25) 1 (27) 1 (29) 1 (31) 1 (33) 1 (35)
L. Brand L ........  2      2      2      2      2      2      2      2
M. Brand M ........  3      3      3      3      3      3      3      3
N. Brand N ........  4      4      4      4      4      4      4      4
O. Brand O ........  5      5      5      5      5      5      5      5
P. Brand P ........  6      6      6      6      6      6      6      6
Q. Brand Q ........  7      7      7      7      7      7      7      7
R. Brand R ........  8      8      8      8      8      8      8      8
S. Brand S ........  9      9      9      9      9      9      9      9

   Others__________  X      X      X      X      X      X      X      X
   (List)__________

=============================================================================

1a)   When  you think of products that (...) what one brand first comes to mind?
      (INTERVIEWER:  CIRCLE ONLY ONE IN COLUMN 1A.)

1b)   What other brands  can you think  of? (PROBE:) What  others come to  mind?
      (CIRCLE AS MANY AS APPLY IN COLUMN 1B.)

1c)   (HAND RESPONDENT CARD)  Looking at this card,  please read me the  letters
      next  to each brand  you have ever  heard of, including  any you may  have
      already mentioned. (CIRCLE ALL THAT APPLY IN COLUMN 1C.)

1d)   This  time, look down the list and read me the letters next to each of the
      brand you have ever bought.  (CIRCLE ALL THAT APPLY IN COLUMN 1D.)

1e)   Which of these brands have you purchased in the last 6 months? (CIRCLE ALL
      THAT APPLY IN COLUMN 1E.)

1f)   Which brands do you buy on a regular basis? (RECORD IN COLUMN 1F.)

1g)   (IF MORE  THAN ONE  BRAND MENTIONED  IN 1F)  Of the  brands you  buy on  a
      regular basis,  which one  would you consider  to be  your primary  brand?
      (CIRCLE ONE BRAND IN COLUMN 1G.)

1h)   What one brand did you buy last?  (RECORD ONE BRAND IN COLUMN 1H.)
=============================================================================

Questions 1B through 1F allow for multiple answers to be circled in each column, so the data will be entered in column-binary format, a common method of representing data in marketing research surveys because it allows one to store multiple responses in a single data column.

Here are some items you might want to check the data for:

There should be no stray punches, that is, codes other than those listed on the grid.
Question 1A must have one, and only one, answer circled.
Any answer to question 1A must not also be circled in question 1B (except "other", which could refer to several different brands).
A respondent must be aware of brands ever purchased.

In addition to generating an error listing, you might want to force edit in certain responses that were not explicitly asked in the questionnaire itself, for example:

If only one brand was mentioned in question 1F, the interviewer should have skipped question 1G, but in that case, the answer to 1F is also the answer to question 1G and should appear in that column as well.
Frequently, as an interview progresses and questions become more specific, respondents remember more brands than they first mentioned, so brands listed as purchased in questions 1E through 1H, but not in 1D, should be edited into the awareness question 1D.

Finally, you might wish to generate some summary codes in the data to make it it easier to tabulate or to get counts that cannot be read directly from marginal counts, such as nets.

Certain brands may be of special interest. For example, if Brands D, I, N, P and S are your client's brands, you could add a NET code to the data, showing net awareness or purchase of any one or more of those brands for each question.
The client wants to contact respondents who use more than one of their brandsfor an in-depth follow-up interview. Print a listing of the ID numbers for all those who qualify for this follow-up study, along with the list of the client's brands mentioned.

Here is a sample of QGEN specifications written to apply the cleaning rules and recodes listed above, extensively annotated to explain exactly what actions are being performed. The actual instructions read by QGEN are shown here in boldface--everything else is a comment in the spec file.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Data directives: Input and output have 80 columns = 160 bytes per case.
*                  Listing file will list specs, ID numbers of all cases 
*                  breaking cleaning rules, run summaries and statistics.
*                  Report file prints ID list created in item # 8.
*
INTAP     olddata.dat ( 160
OUTAP     newdata.dat ( 160
LISTING   cleanrun.txt
REPORT    userlist.txt
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Address mode is column-binary with card/column addressing.
*
SETUP   ADDRMODE B
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Run options:  keywords indicating the location and length of the ID,
*               requesting sequence checking, and calling for a clear of
*               cards 1 through 12 between each record.
*
RUNOPS   ID 101B4   SEQ A   CLEAR 101,280
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Item 1: loop through all 8 2-column questions, checking that the punches
* X and Y do not appear in the first column of each pair, and the punches
* 0 and Y do not appear in the second column of each pair.
*
000 1-8         LOOP   002       002
?BAD-PUNCH      AND    120'NXY'  121'N0Y'
*
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Item 2: check that question 1A has no more than one answer
*
?1A-MULTI       SPB    120'1-0' 121'1-9X'
*
* Item 3: Loop using punch increments to compare questions 1A and 1B on a
* punch by punch basis, starting with brand A, and ending with brand S.
* check that, if a brand is coded in 1A, it is not coded in 1B.
* (Note that the times-to-loop assumes the X and Y punch positions.)
*
000 1-21        LOOP    PI1       PI1
?1A-MATCH-1B    CAND    120'1'    122'N1'
*
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Item 4: Loop using punch increments to compare questions 1D through 1H
* (ORed together) against question 1C on a punch by punch basis, for all
* brands (including others).  Check that all brands appearing in 1D - 1H
* also appear in 1C.
*
000 1-23        LOOP    PI1    PI1
?1C-NOT-CODED   CAND    126'1'!128'1'!130'1'!132'1'!134'1'  124'1'
*
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Item 5: Tally (count) the number of brands mentioned in question 1F into
* counter Z1; if the number is less than 2, copy question 1F into 1G.
*
Z1        TAL   130'1-0'  131'1-9X'
132       CMOV  Z1<'2'    130B2
*
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Item 6: For each of questions 1E through 1H, NET the results into 1D.
*
000 1-4   LOOP  000       002
126       NET   126B2     128B2
*
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Item 7: code a Y punch in the second column of each question if brands
* D, I, N, P or S were mentioned in that question.
*
002 1-8   LOOP  002
121'Y'    IF    120'49'!121'469'
*
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Item 8: if more than one of the client's brands (D, I, N, P or S) are
* coded in question 1F, call a subroutine to clear a character-mode area
* convert the ID number to character-mode, list the client's brands used
* in columns 130 and 131 as characters, and print the character string.
* (Note: column-binary columns use 2 bytes, ASCII characters use 1 byte).
*
LNKLC   MP    130'49'  131'469'                    (call the subroutine)
SKPXX   ALL                                        (skip past subroutine)
*
_LC_      NOP                              (subroutine begins here)
201C132*  ALL                                    (clear area to ASCII)
201       BTA  101B4                             (ID to ASCII)
204       MOV  'Uses brands:'                    (load text constant)
211       CMOV 130'4' 'D '
212       CMOV 130'9' 'I '
213       CMOV 131'4' 'N '
214       CMOV 131'6' 'P '
215       CMOV 131'9' 'S '
PRT201    ALL                                    (print line)
RET       ALL                                    (return to program)
*
_XX_      NOP
*
END

The first seven items in this spec file illustrate the kind of routine cleaning and recode operations that will be familiar to anyone experienced at cleaning commercial survey questionnaire data.
The eighth item illustrates use of callable subroutines in QGEN and shows how it caan be used as a report generator.