Analyzing Text With Word Count and PivotTable

To succeed at Six Sigma, you'll often have to analyze and summarize text data. Most companies have lots of transaction data from "flat files" like the one shown below, but because the data consists of words, sentences and raw numbers, they sometimes have a hard time figuring out what to do with it.

pivottable data

To summarize and analyze this data, you can use Excel's PivotTable function, but we've found that people have a hard time figuring out how to use it.

Pivot tables can:

  • Count the number of times a phrase exists in a column of data (e.g., "complication")
  • Count the number of times a phrase occurs in relation to another column (e.g., physician and adverse events)
  • Count, sum or average numbers in relationship to another column (e.g., average age of complications or total charges per physician)

Since there's so much of this data out there and so few people seem to know how to analyze it, QI Macros includes tools to simplify analyzing flat text files like these:

  • Word Count simply cuts each word out of every sentence in every cell you've selected and then uses pivot tables to count the occurrences and order them in descending order. (Maximum words: 65536)
  • PivotTable Wizard will take columns of data and create a pivot table that summarizes the data in the most likely manner it can devise. After years of pivot tables, we realized that we've developed some mental rules for what type of data to put in which page, row, column or data field for the pivot table to analyze.
    (Maximum Columns: 1-4 columns at a time. While Pivot Tables will handle more than 4, we have found that 4 at a time is sufficient to handle most analysis. Once you've created a pivot table with 4 columns, you can always use Pivot Table's drag-and-drop functionality to add more fields.

With Word Count and the PivotTable Wizard, you can:

  • Find the most frequently used words in the MEMO column.
  • Count the number of times each doctor had a "Complication" during delivery.
  • Sum or average the charges per delivery by doctor.
  • Count the number of deliveries for each diagnosis.

And do it easily.

Word Count

Purpose: Count all of the unique words in selected cells to identify patterns, trends and pareto patterns.

A lot of interesting data is concealed in comments entered by service representatives. Word count parses the words out of sentences and paragraphs and uses Pivot Tables to count the occurrences of individual words and sort them in descending order. To count the words in your selection:

  1. Select the cells you want to analyze (in this case MEMOTEXT from a different file).
  2. Click on QI Macros-Data Transformation-Word Count
    word-count-menu

Original comments look like the cells shown below on the left.  Word Count will then parse each word out of each cell and summarize and order them using Excel's Pivot Table function. See image below right. 

pivottable of text data
counted words and phrases in Excel

      

Note: From these comments the most likely cause of rejected claims was determined to be overlapping dates of service (dos).

PivotTable Wizard

Pivot Tables are a Great Tool, but the User Interface is Too Awkward For Most People

I have found that few people know how to use Excel's PivotTable function to analyze this kind of data. I don't know why, because it's relatively simple drag-and-drop interface.

excel pivottable

I have to believe that it's because the user interface isn't intuitive. So that's why we created the PivotTable Wizard. Here's how you to do it step-by-step using our PivotTable Wizard:

Step 1: Your Data Must Have Column Headings!

As you can see in this data sheet, each column has a heading. The Pivot Table will not run if there is a blank cell in any heading. One of the first mistakes people make is inserting blank columns to make the file more readable, and then they wonder why the Pivot Table won't work.

excel pivot table data

Avoid Mistakes: No Blanks In Column Headings!

Step 2: Select The Data

You can select 1-4 columns of data using your mouse or you can click on up to four column headings (see below: Physician, Total Charge, Date, and Adverse Events) and the Wizard will automatically expand the selection to include all of the data in the column.

Original Transaction File

select data for pivottable

PivotTable Wizard Output 

The PivotTable Wizard takes the data that you've selected and invokes Excel's Pivot table function to summarize them. Based on the content of each column, the QI Macros PivotTable Wizard figures out where to place each slice of data (page, row, column or data field):
PivotTable Wizard Pivot Table output


Based on our experience in creating pivot tables, the Wizard put the Adverse Events into the Page field, dates into the Row field, Physician into the Column Field and Sum of Charges into the Data field.

Step 3: Picking the Best Layout for Your Data

The PivotTable works on a "drag-and-drop" interface. I use:

  • Page Fields for higher level summaries (e.g., facility or location names)
  • Row Fields should have the most frequent heading (often dates)
  • Column Fields should contain less frequently used headings (Excel only has 256 columns available). If you select a column with too many unique words in the cells, the PivotTable will overflow.
  • Data Items are where you drag and drop the words or numbers you want to count or summarize.

You can change data views by clicking on the pull down arrow next to the Page field:

pivottable page field example

Now let's say that I wanted to analyze the charges in terms of adverse events (pregnancy didn't go as planned). I'd just drag and drop Adverse Events into the Page Fields. The Pivot Table gives me a choice of viewing all charges or just the charges with the key word complication, Outlier, Readmission, etc.:

pivot table page field example

If you double click on "Sum of Total Charge" and change it to Average, you get the average cost per delivery:

pivot table to calculate averages

I could also change it back and group all of charges into monthly charges. 
How to: Just Right click on any date and select "group" by month:

pivot table group example 

The Pivot Table will group the months.

pivot table to summarize data


Warning! Bonehead Excel Behavior:
 Grouping dates will not work if there is even a single blank or text cell where there should be a date.

Now I select B4:I4, then hold down the control key and select B8:I8 and run a pareto chart:

pareto chart from pivot table

Changing The Focus

What if I wanted to change this table to count adverse events by Physician? Hint: It's easy with drag and drop.

1. Just click on date and pull it out of the table.

2. Click on Sum of total Charge and pull it out of the table.

3. Drag Adverse events down into Data Fields.

4. Then click on the Age field and drag it into the Row Fields:

Are there more younger women with complications or are there just more younger women giving birth? Does one physician have more complications than the others? We could run a pareto chart to show complications by physician (MD6 is 40% of total complications, almost twice as high as his or her peers):

pareto chart of defect data

What Else?

Maybe I'd like to evaluate average length of stay (LOS) for all deliveries. Just pull the adverse events out of the table and drop in LOS (change it to an average). Not much going on here. The youngest and the oldest had a slightly longer length of stay:

pivot table of adverse event data

Get the Idea?

There's a wealth of information hiding in these dense flat files of words and numbers. Start using the Excel's Pivot Table function to to slice and dice your files (no matter how large). Then use QI Macros to graph the results. You'll find it easy to find the 4% that leads to 50% of the problem and start making breakthrough improvements.

 

Create these charts and diagrams in just seconds using QI Macros for Excel...
Try It Now!