Analyzing Text With Word Count and Crosstab


© 2007 KnowWare International, Inc. DBA LifeStar
2253 S. Oneida St., Ste 3D
Denver, CO 80224

Jay Arthur
888-468-1537
303-756-9144
Email Us

We help people think!

To succeed at Six Sigma, you'll often have to analyze and summarize text data. Most companies have lots of transaction data from "flat files" like the one shown below, but because the data consists of words, sentences and raw numbers, they sometimes have a hard time figuring out what to do with it.

To summarize and analyze this data, you can use Excel's PivotTable function, but we've found that people have a hard time figuring out how to use it.

Pivot tables can:

  • Count the number of times a phrase exists in a column of data (e.g., "complication")
  • Count the number of times a phrase occurs in relation to another column (e.g., physician and adverse events)
  • Count, sum or average numbers in relationship to another column (e.g., average age of complications or total charges per physician)

Since there's so much of this data out there and so few people seem to know how to analyze it, the QI Macros include tools to simplify analyzing flat text files like these:

  • Word Count simply cuts each word out of every sentence in every cell you've selected and then uses pivot tables to count the occurrences and order them in descending order. (Maximum words: 65536)
  • Crosstab Wizard will take columns of data and create a pivot table that summarizes the data in the most likely manner it can devise. After years of pivot tables, we realized that we've developed some mental rules for what type of data to put in which page, row, column or data field for the pivot table to analyze.
    (Maximum Columns: 1-4 columns at a time. While Pivot Tables will handle more than 4, we have found that 4 at a time is sufficient to handle most analysis. Once you've created a pivot table with 4 columns, you can always use Pivot Table's drag-and-drop functionality to add more fields.

With Word Count and the CrossTab Wizard, you can:

  • Find the most frequently used words in the MEMO column.
  • Count the number of times each doctor had a "Complication" during delivery.
  • Sum or average the charges per delivery by doctor.
  • Count the number of deliveries for each diagnosis.

And do it easily.

Word Count

Purpose: Count all of the unique words in selected cells to identify patterns, trends and pareto patterns.

A lot of interesting data is concealed in comments entered by service representatives. Word count parses the words out of sentences and paragraphs and uses Pivot Tables to count the occurrences of individual words and sort them in descending order. To count the words in your selection:

  1. Select the cells you want to analyze (in this case MEMOTEXT from a different file).
  2. Click on QI Macros-Data Transformation-Word Count

Original comments look like the cells shown below.
Word Count
will then parse each word out of each cell and summarize and order them using Excel's Pivot Table function (below right).

     word count excel

Note: From these comments the most likely cause of rejected claims--was determined to be overlapping dates of service (dos).

CrossTab Wizard

Pivot Tables are a Great Tool, but the User Interface is Too Awkward For Most People

I have found that few people know how to use Excel's PivotTable function to analyze this kind of data. I don't know why, because it's relatively simple drag-and-drop interface.

I have to believe that it's because the user interface isn't intuitive. So that's why we created the CrossTab Wizard. Here's how you to do it step-by-step using our Crosstab Wizard:

Step 1: Your Data Must Have Column Headings!

As you can see in this data sheet, each column has a heading. The Pivot Table will not run if there is a blank cell in any heading. One of the first mistakes people make is inserting blank columns to make the file more readable, and then they wonder why the Pivot Table won't work.

Avoid Mistakes: No Blanks In Column Headings!

Step 2: Select The Data

You can select 1-4 columns of data using your mouse or you can click on up to four column headings (see below: Physician, Total Charge, Date, and Adverse Events) and the Wizard will automatically expand the selection to include all of the data in the column.

Original Transaction File

CrossTab Wizard Output
The CrossTab Wizard takes the data that you've selected and invokes Excel's Pivot table function to summarize them. Based on the content of each column, the QI Macros CrossTab Wizard figures out where to place each slice of data (page, row, column or data field):
CrossTab Wizard Pivot Table


Based on our experience in creating pivot tables, the Wizard put the Adverse Events into the Page field, dates into the Row field, Physician into the Column Field and Sum of Charges into the Data field.

Step 3: Picking the Best Layout for Your Data

The PivotTable works on a "drag-and-drop" interface. I use:

  • Page Fields for higher level summaries (e.g., facility or location names)
  • Row Fields should have the most frequent heading (often dates)
  • Column Fields should contain less frequently used headings (Excel only has 256 columns available). If you select a column with too many unique words in the cells, the PivotTable will overflow.
  • Data Items are where you drag and drop the words or numbers you want to count or summarize.

You can change data views by clicking on the pull down arrow next to the Page field:



Now let's say that I wanted to analyze the charges in terms of adverse events (pregnancy didn't go as planned). I'd just drag and drop Adverse Events into the Page Fields. The Pivot Table gives me a choice of viewing all charges or just the charges with the key word complication, Outlier, Readmission, etc.:

If you double click on "Sum of Total Charge" and change it to Average, you get the average cost per delivery:

I could also change it back and group all of charges into monthly charges.
How to: Just Right click on any date and select "group" by month:

The Pivot Table will group the months.


Warning! Bonehead Excel Behavior:
Grouping dates will not work if there is even a single blank or text cell where there should be a date.

Now I select B4:I4, then hold down the control key and select B8:I8 and run a pareto chart:

Changing The Focus

What if I wanted to change this table to count adverse events by Physician? Hint: It's easy with drag and drop.

1. Just click on date and pull it out of the table.

2. Click on Sum of total Charge and pull it out of the table.

3. Drag Adverse events down into Data Fields.

4. Then click on the Age field and drag it into the Row Fields:

Are there more younger women with complications or are there just more younger women giving birth? Does one physician have more complications than the others? We could run a pareto chart to show complications by physician (MD6 is 40% of total complications, almost twice as high as his or her peers):

What Else?

Maybe I'd like to evaluate average length of stay (LOS) for all deliveries. Just pull the adverse events out of the table and drop in LOS (change it to an average). Not much going on here. The youngest and the oldest had a slightly longer length of stay:

Get the Idea?

There's a wealth of information hiding in these dense flat files of words and numbers. Start using the Excel's Pivot Table function to to slice and dice your files (no matter how large). Then use the QI Macros to graph the results. You'll find it easy to find the 4% that leads to 50% of the problem and start making breakthrough improvements.

Buy QI Macros Excel SPC Software for Histograms and Control Charts Now QI Macros Excel SPC Software for Histograms and Control Charts $139 Plus S&H

 


© 2007 KnowWare International Inc. (888) 468-1537
knowwareman@qimacros.com
Home | Lean Six Sigma | QI Macros | KnowWare | Site Map

QI Macros 30-day Trial & FREE QI Macros Course
Name:
E-mail
 

QI Macros SPC Software for Excel
Or Buy It Now!
Unconditional
90-Day
Money-Back
Guarantee

The QI Macros for Excel $139, is an inexpensive easy to use set of Excel add-ins for statistical process control and Lean Six Sigma. It draws line, pie, bar, pareto, box whisker, histogram (Cp, Cpk), scatter and control charts (with stability analysis).

It contains over 70 fill in the blank templates such as the Ishikawa diagram, QFD, DOE, FMEA, PPAP, and Gage R&R for MSA. Performs ANOVA, t-test, F-test, and regression analysis.

Buy All-in-One, Excel SPC Software Now

Quantity Discounts

Customer Testimonials

Minitab Comparison

QI Macros FAQs

FREE Lean Six Sigma Course
& QI Macros 30-day Trial
Name:
E-mail