
Methodology
PEJ MEDIA REPORT CARD
CONTENT ANALYSIS
GENERAL METHODOLOGY
SAMPLING AND INCLUSION
Two distinct categories of media were studied as part of the 2005 PEJ Media Report Card project.
The first, text-based media, included newspapers and Internet news sites. Princeton Survey Research Associates International conducted coding for those media.
The second, electronic media, included both broadcast network and cable network news. The School of Journalism at Michigan State University conducted coding for Broadcast Network News. The Institute for Communication Research of the College of Communication & Information Sciences at the University of Alabama conducted coding for Cable Network News.
Print, broadcast network and cable were each subject to a specific methodological approach regarding sampling and selection and coding. In all, the study examined some 16,800 stories. This included 6,589 newspaper stories, 1,903 online stories, 1,768 stories from network television and about 6,550 stories on cable news (the cable news study included two parts, a 20 day sample and a five day sample, in which some stories overlapped).
I. TEXT-BASED MEDIA
NEWSPAPERS
Newspaper Selection
Individual newspapers were selected to present a meaningful assessment of the content that is widely available to the public. Selections were made on both a geographic and a demographic basis, as well as diversity of ownership.
First, newspapers were divided into four groups based on daily circulation: Over 750,000; 300,001 to 750,000; 100,001 to 300,000, and 100,000 and under.
We included four newspapers over 750,000: USA Today, the Los Angeles Times, The New York Times, and The Washington Post. (The Wall Street Journal, which also falls in this category, was excluded as a specialty publication.)
Four newspapers were chosen in each of the remaining three categories. To ensure geographical diversity, each of the four newspapers within a circulation category was selected from a different geographic region of the U.S. Regions were defined according to the parameters established by the U.S. Census Bureau.1
The newspapers in circulation groups two through four were selected through the following process:
First, using the Editor and Publisher Yearbook, we created a list of every daily newspaper in the U.S. Within each category, newspapers were selected at random until all categories were filled. To be eligible for selection, a newspaper was required to a) have a Sunday section, b) have a daily sports section, c) have its stories indexed in a news database, to be available to coders, and d) not be a tabloid. Newspapers not meeting those criteria were skipped over. In addition, an effort was made to ensure diversity in ownership.
Circulation Group 1
Los Angeles Times
New York Times
USA Today
Washington Post
Circulation Group 2
Cleveland Plain Dealer
Dallas Morning News
Philadelphia Inquirer
Sacramento Bee
Circulation Group 3
Albuquerque Journal
Asbury Park Press
Kansas City Star
San Antonio Express-News
Circulation Group 4
Bloomington (Illinois) Pantagraph
Hanover (Pennsylvania) Evening Sun
McAllen (Texas) Monitor
Vacaville (California) Reporter
Newspaper Study Operative Dates, 2004
Random sampling was used to select a sample of individual days for the study. By choosing individual days rather than weeks, we hoped to provide a broader look at news coverage that more accurately represented the entire year. To account for variations related to the different days of the week, the 28 days that were sampled included 4 of each day of the week. Dates were chosen from January 1 to October 13, a span of 286 days. October 13 was made the cutoff date to allow time for coding. Omitted dates included those of the Olympics and the Republican and Democratic National Conventions.
The following dates were generated and make up the 2004 sample.
January- 13, 16, 23
February- 2, 13, 23rd, 29th
March- 8, 12, 13, 14, 19, 24
April- 8, 15
May- 1, 4, 20
June- 8, 9, 16
July- 19, 25
August- 10, 12
September- 4, 22, 26
Story Procurement, Selection, and Inclusion
Stories were procured via hard copies of daily publications, supplemented by a combination of electronic databases (DIALOG, FACTIVA, and NEXIS).
All stories with distinct bylines that appeared on a particular newspaper's front page (Page A1), on the first page of the Local/Metro section, or on the first page of the sports section were selected for analysis.
INTERNET NEWS SITES
To select the Internet news sites to be coded, the Nielsen/NetRatings top 20 news sites list was consulted to determine the most prominent sites. The list contained four basic types of sites: news aggregators,2 newspaper sites, network news sites, and cable news sites. Two sites were chosen for each of those categories. For aggregators, AOL and Yahoo were selected; they were the only two aggregators in top 20 list. For network news outlets, two sites were randomly chosen from among ABC, CBS, and MSNBC. (MSNBC appeared on both the network and cable lists because it is the news site for both NBC News and the MSNBC cable channel.) For cable sites, CNN and Fox News were chosen, since MSNBC had already been chosen from among the broadcast networks. For newspapers, the first site was chosen randomly from the four newspapers in Circulation Group 1, and the second was chosen randomly from the 12 newspapers in Groups 2 through 4. To be selected the newspaper had to have an active daily Web site. In addition, a local-TV news site was chosen. The market for local TV was chosen by randomly selecting one of the 15 markets from the newspaper sample and then randomly choosing among ABC, CBS, NBC, and Fox.
The following sites were included in the 2004 study:
ABC News (www.ABCNEWS.com)
AOL (news section front page)
Bloomington Pantagraph (www.pantagraph.com)
CBS 11 TV - Dallas (www.cbs11tv.com)
CNN (www.cnn.com)
MSNBC (www.msnbc.com)
Washington Post (www.washingtonpost.com)
Yahoo! (news.yahoo.com)
Internet News Sites - Operative Dates 2004
The 2004 Internet study had two components. The first was a twenty-day sample that matched the dates of the newspaper sample, Mondays through Fridays. Weekends were not included for Internet, broadcast or cable sites. Again, the eligible dates ranged from January 1 to October 13, a period of 286 days.
The following dates were generated and constitute the 2003 Internet News Site sample.
January- 13, 16, 23
February- 2, 13, 23, 29
March- 8,* 12, 13, 14, 19,* 24
April- 8, 15
May- 1, 4, 20
June- 8, 9, 16*
July- 19, 25
August- 10,* 12*
September- 4, 22, 26
*Multiple Download Dates
In addition to the main sample, we conducted an additional study of five of
those days in order to replicate the freshness variable studied in 2003. Among
the 20-day sample, one day for each weekday was randomly selected.
Story Procurement, Selection, and Inclusion
For the main 20-day sample, each site was visited once a day. The download time rotated each day among four different hours: 9:00 A.M., 1:00 P.M., 5:00 P.M. and 9:00 P.M, ET. The order in which the sites were visited was also rotated for each capture time. Each download took approximately twenty minutes.
For the five-day sample, each site was visited four times on each day - 9:00 a.m. ET, 1:00 p.m. ET, 5:00 p.m. ET, and 9:00 p.m. ET - to download stories. The order in which the sites were visited was rotated for each capture time. Each download took approximately twenty minutes.
Each time, the following method was used to determine which stories to capture:
On the news home page of each of the sites, we identified featured stories. A story at the top of a page tied in to a graphic element - commonly a picture of an event or person - was counted as a featured story and captured for study. Multiple stories on the page relating to the same graphic element were also captured as featured stories. Pages with more than one graphic element were considered to have more than one featured story, and all such stories were studied.
After the featured stories, we included the next three most prominent stories without graphics starting from the top and moving down. Those stories were recorded as non-featured.
The following rules were put into place in selecting stories:
Text-Based Media Coding Procedures
General practice called for a coder to work through no more than seven days/issues from any newspaper outlet during a coding session. After completing up to seven days/issues from one publication, coders switched to another text-based-media outlet, and continued to code up to seven days/issues.
All coding personnel rotated through all circulation groups, publications/sites, with the exception of the designated control publications. A control publication was chosen in each category of text media. The designated control publication/date was initially handled by only one coder. That work was then over-sampled during intercoder reliability testing.
Working with a standardized codebook and coding rules, coders generally worked through each story in its entirety, beginning with the Inventory Variables - publication date, story length, placement, and origination. Next, they recorded the codes for each story's "content variables" - topics, recurring leads/big stories, newsmakers, tone, sourcing levels, and frame. Additional variables for Internet outlets measured links to graphics, audio, video, and photo galleries; and for the five multiple-download days, an additional variable measured story freshness.
Intercoder Reliability Testing for Text Media
Intercoder reliability measures the extent to which two coders, operating individually, reach the same coding decisions. The principal coding team for text media comprised four people who were trained as a group. One coder was designated as a general control coder, and worked off-site for the duration of the project. In addition, one newspaper was designated as a control source.
At the completion of the general coding process, each coder, working alone and without access to the initial coding decisions, re-coded publications originally completed by another coder. Intercoding tests were performed on 5% of all cases in connection with inventory variables, and agreement rates exceeded 98% for those variables. For the more difficult content variables, 20% of all publications/sites were re-coded, and intercoder agreement rates were as follows:
Trigger: 93%
Politics Trigger: 97%
Big Story: 96%
Campaign Trigger: 98%
Topic: 92%
Newsmaker: 90%
Tone: 96%
Source Transparency: 95%
Anonymous Sources: 98%
Data: 97%
Female Sources: 98%
Male Sources: 97%
Mix of Viewpoints: 92%
Stakeholders: 90%
Jnlst. Opinion/Speculation: 90%
Dominant Frame: 88%
Additional Frame: 87%
No significant differences were found to exist on a recurring basis.
II. BROADCAST NETWORK NEWS
The ability to make direct comparisons between newspaper and broadcast network findings was a project design goal, so the weekday sample dates for those two news categories are identical. Because of preemptions and schedule changes, weekend network news broadcasts do not always appear in all markets, so Saturday and Sunday broadcast network news programs were excluded from the study.
On a handful of the sample dates, special events pre-empted the evening newscasts. In such instances an alternate date for the same day of the week was selected at random. The final dates were as follows:
January- 13, 16, 23
February- 2, 23
March- 8, 12, 19, 24
April- 8, 15
May- 4, 20
June- 8, 9, 16
July- 19
August- 10, 12
September- 15, 22
June 9 commercial network newscasts were not used because the programming was
preempted by the ceremonies remembering President Ronald Reagan. NewsHour was
studied on this date. September 15 was used as a substitute for June 9 for the
network newscasts.
BROADCAST NETWORK MORNING NEWS PROGRAMS
(7:00 a.m. - 7:59 a.m. Eastern Time Airings)
ABC Good Morning America
CBS The Early Show
NBC The Today Show
BROADCAST NETWORK EVENING NEWS PROGRAMS
(Full program as broadcast in New York market)
ABC World News Tonight
CBS Evening News
NBC Nightly News
PBS NewsHour
Program Procurement and Story Selection and Inclusion
The morning and evening broadcasts were procured through both transcripts and video tape. Transcripts were obtained through the Nexis electronic database. Videotaped programs were captured live in the New York City market by ADT Research. For the evening newscasts, that represents each day's 6:30 P.M. East Coast feed. PBS supplied the Project with tapes of the NewsHour.
In the mornings, the following content was analyzed: stories read by the newscaster in the half-hourly news blocks; feature and interview segments outside of the news blocks; banter between members of the anchor team whose import was other than to tease coming segments in that day's program or to promote the network's programming at some later time. One-fifth, 20%, of the sample was coded for teasers and promos and analyzed separately. Excluded from the analysis were the content of the weather blocks, local news inserts, commercials, and other content-free editorial matter such as logos, studio shots, openings and closings.
In the evenings the same rules applied, but because the content of the newscasts is less variegated, concerns about news blocks, banter, weather blocks and local news inserts were not applicable.
Broadcast Network Coding Procedures
Faculty and graduate students in the School of Journalism at Michigan State University conducted this part of the project. The two faculty members who supervised the project have more than 40 years of combined social-science experience in conducting such studies, and are two of the most published academic researchers in the field. Two students in the mass-media Ph.D. program at MSU, one a third-year student and the other a second-year student, coded most of the stories, assisted by a master's-degree graduate of the MSU Department of Communication. In addition, two current master's-degree students in the School of Journalism coded parts of the newscasts. Coding was done independently, working from the protocol, without consultation among the coders.
The coding protocol was provided by the Project for Excellence in Journalism. It called for coding 23 variables for each story in each designated newscast. Nineteen variables required substantive categorical judgments by coders.
In the course of the training, a decision was made to use only the two doctoral students to code the last four study variables: presence of multiple viewpoints, journalist's opinion, dominant story frame and presence of multiple frames. Those variables required more familiarity with journalistic ethics and standards, and the two doctoral students could bring extensive professional as well as academic training to the task. Each network was coded in turn.
Inter-Coder Reliability Testing for Broadcast Network News
A coder reliability assessment for each completed network was then conducted with a random sample of dates taken from those supplied by the State of the Media project. This usually consisted of one or two days used in the assessment from the total of days sampled, resulting in a sample of 5% to 10% of the total stories coded.
Percentages of agreement calculations were made to assess the coding for each of the variables requiring categorical choices among variable values.
Fifty-three stories from the evening newscasts and 69 from the morning newscasts (a total of 122 stories, or 7% of all stories) were used to test reliability. All of the variables used in the State of the Media analysis presented here achieved at least 90% inter-coder agreement, except story topic. The original story-topic coding scheme involved more than 300 subcategories, and reliability was below 80%. But when the coding was collapsed into the 12 categories used in this analysis, the inter-coder agreement reached 83% for all stories.
The content categories used in this analysis and their inter-coder agreement
were: story prominence, 98%; story origin, 95%; big story, 95%; story topic,
83%; story tone, 91%; source transparency, 91%; anonymous sources, 94%; female
sources, male sources 98%; multiple viewpoints, 93%, and presence of journalist
opinion, 91%.
Table 1: Coder Reliability for Evening News Shows
|
ABC
(N=17) |
CBS
(N=10) |
NBC
(N=12) |
PBS
(N=14) |
|
| Dateline |
96%
|
100%
|
100%
|
100%
|
| Story Prominence |
92%
|
100%
|
95%
|
100%
|
| Story Origin |
92%
|
93%
|
100%
|
95%
|
| Story Trigger |
61%
|
60%
|
78%
|
86%
|
| Party Trigger |
92%
|
93%
|
100%
|
100%
|
| Big Story |
80%
|
93%
|
100%
|
100%
|
| Campaign Topic |
100%
|
87%
|
100%
|
100%
|
| Story Topic* |
76%
|
80%
|
67%
|
100%
|
| Lead Newsmaker |
72%
|
63%
|
63%
|
66%
|
| Story Tone |
92%
|
100%
|
74%
|
81%
|
| Transparency |
80%
|
73%
|
93%
|
100%
|
| Anonymous Sources |
88%
|
100%
|
89%
|
100%
|
| Data Transparency |
96%
|
100%
|
93%
|
100%
|
| Female Source Number |
96%
|
100%
|
100%
|
100%
|
| Male Source Number |
90%
|
90%
|
100%
|
95%
|
| Multiple Viewpoints |
94%
|
80%
|
100%
|
86%
|
| Reporter Speculation |
82%
|
80%
|
89%
|
93%
|
| Dominant Frame |
94%
|
70%
|
100%
|
93%
|
| Multiple Frames |
88%
|
100%
|
89%
|
86%
|
| *Achieved reliability after collpasing topic categories. | ||||
Table 2: Coder Reliability for Morning News Shows
|
ABC
(N=22) |
CBS
(N=17) |
NBC
(N=30) |
|
| Dateline |
97%
|
94%
|
100%
|
| Story Prominence |
93%
|
65%
|
100%
|
| Story Origin |
86%
|
100%
|
100%
|
| Story Trigger |
80%
|
90%
|
96%
|
| Party Trigger |
94%
|
100%
|
100%
|
| Big Story |
92%
|
100%
|
100%
|
| Campaign Topic |
100%
|
100%
|
100%
|
| Story Topic* |
73%
|
91%
|
95%
|
| Lead Newsmaker |
85%
|
84%
|
93%
|
| Story Tone |
88%
|
100%
|
95%
|
| Transparency |
88%
|
96%
|
100%
|
| Anonymous Sources |
83%
|
96%
|
100%
|
| Data Transparency |
97%
|
100%
|
100%
|
| Female Source Number |
95%
|
96%
|
100%
|
| Male Source Number |
85%
|
100%
|
100%
|
| Multiple Viewpoints |
100%
|
88%
|
100%
|
| Reporter Speculation |
100%
|
88%
|
100%
|
| Dominant Frame |
100%
|
88%
|
94%
|
| Multiple Frames |
91%
|
94%
|
94%
|
| *Achieved reliability after collpasing topic categories. | |||
Table 3: Coder Reliability Summary for Evening News, Morning
News and All News Programs
|
Evening News (N=53)
|
Morning News (N=69)
|
All (N=122)
|
|
| Dateline |
99%
|
97%
|
98%
|
| Story Prominence |
97%
|
86%
|
91.5%
|
| Story Origin |
95%
|
95%
|
95%
|
| Story Trigger |
71%
|
89%
|
80%
|
| Party Trigger |
96%
|
98%
|
97%
|
| Big Story |
93%
|
97%
|
95%
|
| Campaign Topic |
97%
|
100%
|
98.5%
|
| Story Topic* |
81%
|
86%
|
83.4%
|
| Lead Newsmaker |
66%
|
87%
|
76.5%
|
| Story Tone |
87%
|
94%
|
90.5%
|
| Transparency |
87%
|
95%
|
91%
|
| Anonymous Sources |
94%
|
93%
|
93.5%
|
| Data Transparency |
97%
|
99%
|
98%
|
| Female Source Number |
99%
|
97%
|
98%
|
| Male Source Number |
94%
|
95%
|
94.5%
|
| Multiple Viewpoints |
90%
|
96%
|
93%
|
| Reporter Speculation |
86%
|
96%
|
91%
|
| Dominant Frame |
89%
|
94%
|
91.5%
|
| Multiple Frames |
91%
|
93%
|
92%
|
| *Achieved reliability after collpasing topic categories. | |||
III. CABLE NEWS
Cable News Programming - Outlet Selection and Operative Dates 2004
As with the online sample, the 2004 Cable study had two components. The first was a twenty-day sample that matched the dates of the newspaper sample on Mondays through Fridays. Weekends were not included for the Internet, broadcast or cable. Again, the eligible dates ranged from January 1 to October 13, a period of 286 days. On a handful of the sample dates, special events pre-empted the evening newscasts. In such instances an alternate date for the same day of the week was selected at random.
The following dates were generated and make up the 2004 cable news sample:
January- 13, 16, 23
February- 2,* 23
March- 8, 12, 19,* 24h
April- 15
May- 4,* 20*
June- 8, 9, 16*
July- 19
August- 5, 10, 12
September- 22
* Indicates cable station programming was taped continuously from 7 a.m. to 11 p.m.
In addition to the main sample, we also conducted an additional study of five of these days to replicate the freshness variable studied in 2003. From the 20-day sample, one day for each weekday was randomly selected. These days were:
February 2
March 19
May 4
May 20
June 16
Story Procurement and Inclusion
To assess the nature of the 24-hour news cycle as presented on cable news programming, CNN, Fox News, and MSNBC were selected because they were the three most-viewed cable news channels in 2003.
For the twenty-day sample, we selected three program types to study at each network: Daytime programming, the closest thing to a traditional newscast, and the highest-rated prime time talk show. The following programs were captured and analyzed:
DAYTIME PROGRAMMING
The 11-to-12 o'clock hour for each network
NEWSCAST/NEWS DIGEST PROGRAMS
CNN's NewsNight with Aaron Brown
FOX's Special Report with Brit Hume
MSNBC's Countdown with Keith Olbermann
PRIME-TIME TALK PROGRAMS
CNN's Larry King Live
FOX's O'Reilly Factor
MSNBC's Hardball with Chris Matthews
For the five-day sample, all programming was captured and coded from 7 a.m. (the beginning of the morning shows) until 11 p.m. (the end of prime time), a 16-hour stretch of programming. This resulted in some 240 hours of programming.
All cable programming was procured through both videotape and transcripts, although transcripts were not available for the Fox News programming at the 11:00 a.m. hour. Transcripts were obtained through the Nexis electronic database. Videotaped programs were captured live in the Washington, D.C. market. In some instances tapes were provided to us by VMS, a commercial third-party monitoring service.
Cable News Coding Procedures
The cable news coding was conducted by faculty members, graduate students, and research staff people affiliated with the Institute for Communication and Information Research at the University of Alabama. Six coders were involved throughout the coding process. All coders worked independently, without consulting one another regarding specific coding decisions.
Cable News Inter-coder Reliability Testing
As noted, three program types were studied for each of the three cable news networks. To assess reliability within and across program types, we randomly selected six of the 60 hours of daytime programming, six of the 60 hours of news-digest programming, and six of the 60 hours of prime-time talk programming. In other words, the reliability sample was stratified by program type.
The reliability sample was also stratified by network. Within the six hours for each program type we included two hours from each of the three networks.
This 18-hour sample represents 10% of the 180 hours of programming included in the study; the 6-hour sample for each program type represents 10% of the 60 hours dedicated to each of the three program types.
Percentages of agreement calculations were made to assess the coding for each of the variables requiring categorical choices among variable values.
All of the variables used in the State of the Media analysis presented here achieved at least 88% inter-coder agreement, ranging from a low of 88% for presence of journalist opinion to a high of 98% for story prominence.
IV. STATISTICAL ANALYSES IN THE CONTENT ANALYSIS
For each media subcategory - newspapers, Internet news sites, broadcast network news and cable network news - separate datasets were created, and separate tabulations were constructed. Whenever comparisons were made across content categories within a single medium, the chi square statistic was employed to determine whether the comparisons were based on statistically different observations. Data from all the channels were aggregated into a single file so that cross-media comparisons could be made. Again, whenever a cross-media comparison is referred to as showing different patterns, that statistical difference in those patterns was determined with the chi square statistic.
For much of this report, the individual news story is the unit of analysis. There are, however, selected variables where it was more informative to present analysis through a measurement of the time or words devoted to particular topics or recurring leads.
Within each universe (cable, newspapers, etc.), each case in the applicable SPSS dataset represents one story. Length is one of the measurements recorded for each case. (Note: for network and cable, this number represents seconds; for newspapers and news magazines, this number represents word count; for Internet, no volumetric analysis was applied.)
To create the volumetric tables, each case was selected, and the number recorded in the Length variable was designated as a weight. Then, that individual weight was applied to each individual case. The resulting weighted dataset was used in the production of volumetric tables for selected variables.
Statistical Analyses
For most comparisons of how content and structure of the news vary as a function of which medium is being examined, chi square analyses were used. Chi square is a non-parametric statistic that examines the relationship between nominal variables, that is, variables that are identified by "name" and are not on a numeric scale (e.g., CNN, MSNBC, and Fox News are nominal variables.) As noted by Riffe, Lacy, & Fico3 (1998), pp. 167-168:
"The chi-square test of statistical significance is based on the assumption that the randomly sampled data appropriately described, within sampling error, the population's proportions of cases falling into the categorical values of the variables being tested.
Chi-square starts with the assumption that there is in the population only random association between the two variables, and that any sample finding to the contrary is merely a sampling artifact.
For each cell in a table linking the two variables, chi-square calculates the theoretical expected proportions based in a posited null relationship. The empirically obtained data are then compared cell by cell with the expected null-relationship proportions. Specifically the absolute value of the differences between the observed and expected values in each cell goes into the computation of the chi-square statistic. Therefore, the chi-square statistic is large when the differences between the empirical and theoretical cell frequencies are large, and small when the empirically obtained data more closely resemble the pattern of the null relationship.
This chi-square static has known values that permit a researcher to reject the null hypothesis (no relationship between the variables) at the standard 95% and 99% levels of probability."
|
|