Analyzing Text With Word Count and PivotTable
To succeed at Six Sigma, you'll often have to analyze and summarize text data. Most companies have lots of transaction data from "flat files" like the one shown below, but because the data consists of words, sentences and raw numbers, they sometimes have a hard time figuring out what to do with it.
To summarize and analyze this data, you can use Excel's PivotTable function, but we've found that people have a hard time figuring out how to use it.
Pivot tables can:
- Count the number of times a phrase exists in a column of data (e.g., "complication")
- Count the number of times a phrase occurs in relation to another column (e.g., physician and adverse events)
- Count, sum or average numbers in relationship to another column (e.g., average age of complications or total charges per physician)
Since there's so much of this data out there and so few people seem to know how to analyze it, QI Macros includes tools to simplify analyzing flat text files like these:
- Word Count simply cuts each word out of every sentence in every cell you've selected and then uses pivot tables to count the occurrences and order them in descending order. (Maximum words: 65536)
- PivotTable Wizard will take columns of data and create a pivot table that summarizes the data in the most likely manner it can devise. After years of pivot tables, we realized that we've developed some mental rules for what type of data to put in which page, row, column or data field for the pivot table to analyze.
(Maximum Columns: 1-4 columns at a time. While Pivot Tables will handle more than 4, we have found that 4 at a time is sufficient to handle most analysis. Once you've created a pivot table with 4 columns, you can always use Pivot Table's drag-and-drop functionality to add more fields.
With Word Count and the PivotTable Wizard, you can:
- Find the most frequently used words in the MEMO column.
- Count the number of times each doctor had a "Complication" during delivery.
- Sum or average the charges per delivery by doctor.
- Count the number of deliveries for each diagnosis.
And do it easily.
Purpose: Count all of the unique words in selected cells to identify patterns, trends and pareto patterns.
A lot of interesting data is concealed in comments entered by service representatives. Word count parses the words out of sentences and paragraphs and uses Pivot Tables to count the occurrences of individual words and sort them in descending order. To count the words in your selection:
- Select the cells you want to analyze (in this case MEMOTEXT from a different file).
- Click on QI Macros-Data Transformation-Word Count
Original comments look like the cells shown below on the left. Word Count will then parse each word out of each cell and summarize and order them using Excel's Pivot Table function. See image below right.
Note: From these comments the most likely cause of rejected claims was determined to be overlapping dates of service (dos).
Pivot Tables are a Great Tool, but the User Interface is Too Awkward For Most People
I have found that few people know how to use Excel's PivotTable function to analyze this kind of data. I don't know why, because it's relatively simple drag-and-drop interface.
I have to believe that it's because the user interface isn't intuitive. So that's why we created the PivotTable Wizard. Here's how you to do it step-by-step using our PivotTable Wizard:
Step 1: Your Data Must Have Column Headings!
As you can see in this data sheet, each column has a heading. The Pivot Table will not run if there is a blank cell in any heading. One of the first mistakes people make is inserting blank columns to make the file more readable, and then they wonder why the Pivot Table won't work.
Avoid Mistakes: No Blanks In Column Headings!
Step 2: Select The Data
You can select 1-4 columns of data using your mouse or you can click on up to four column headings (see below: Physician, Total Charge, Date, and Adverse Events) and the Wizard will automatically expand the selection to include all of the data in the column.
Original Transaction File
PivotTable Wizard Output
The PivotTable Wizard takes the data that you've selected and invokes Excel's Pivot table function to summarize them. Based on the content of each column, the QI Macros PivotTable Wizard figures out where to place each slice of data (page, row, column or data field):
Based on our experience in creating pivot tables, the Wizard put the Adverse Events into the Page field, dates into the Row field, Physician into the Column Field and Sum of Charges into the Data field.
Step 3: Picking the Best Layout for Your Data
The PivotTable works on a "drag-and-drop" interface. I use:
- Page Fields for higher level summaries (e.g., facility or location names)
- Row Fields should have the most frequent heading (often dates)
- Column Fields should contain less frequently used headings (Excel only has 256 columns available). If you select a column with too many unique words in the cells, the PivotTable will overflow.
- Data Items are where you drag and drop the words or numbers you want to count or summarize.
You can change data views by clicking on the pull down arrow next to the Page field:
Now let's say that I wanted to analyze the charges in terms of adverse events (pregnancy didn't go as planned). I'd just drag and drop Adverse Events into the Page Fields. The Pivot Table gives me a choice of viewing all charges or just the charges with the key word complication, Outlier, Readmission, etc.:
If you double click on "Sum of Total Charge" and change it to Average, you get the average cost per delivery:
I could also change it back and group all of charges into monthly charges.
How to: Just Right click on any date and select "group" by month:
The Pivot Table will group the months.
Warning! Bonehead Excel Behavior: Grouping dates will not work if there is even a single blank or text cell where there should be a date.
Now I select B4:I4, then hold down the control key and select B8:I8 and run a pareto chart:
Changing The Focus
What if I wanted to change this table to count adverse events by Physician? Hint: It's easy with drag and drop.
1. Just click on date and pull it out of the table.
2. Click on Sum of total Charge and pull it out of the table.
3. Drag Adverse events down into Data Fields.
4. Then click on the Age field and drag it into the Row Fields:
Are there more younger women with complications or are there just more younger women giving birth? Does one physician have more complications than the others? We could run a pareto chart to show complications by physician (MD6 is 40% of total complications, almost twice as high as his or her peers):
Maybe I'd like to evaluate average length of stay (LOS) for all deliveries. Just pull the adverse events out of the table and drop in LOS (change it to an average). Not much going on here. The youngest and the oldest had a slightly longer length of stay:
Get the Idea?
There's a wealth of information hiding in these dense flat files of words and numbers. Start using the Excel's Pivot Table function to to slice and dice your files (no matter how large). Then use QI Macros to graph the results. You'll find it easy to find the 4% that leads to 50% of the problem and start making breakthrough improvements.