Send to Printer



Methodology

PEJ MEDIA REPORT CARD

CONTENT ANALYSIS

GENERAL METHODOLOGY


SAMPLING AND INCLUSION

Two distinct categories of media were studied as part of the 2005 PEJ Media Report Card project.

The first, text-based media, included newspapers and Internet news sites. Princeton Survey Research Associates International conducted coding for those media.

The second, electronic media, included both broadcast network and cable network news. The School of Journalism at Michigan State University conducted coding for Broadcast Network News. The Institute for Communication Research of the College of Communication & Information Sciences at the University of Alabama conducted coding for Cable Network News.

Print, broadcast network and cable were each subject to a specific methodological approach regarding sampling and selection and coding. In all, the study examined some 16,800 stories. This included 6,589 newspaper stories, 1,903 online stories, 1,768 stories from network television and about 6,550 stories on cable news (the cable news study included two parts, a 20 day sample and a five day sample, in which some stories overlapped).

I. TEXT-BASED MEDIA

NEWSPAPERS

Newspaper Selection

Individual newspapers were selected to present a meaningful assessment of the content that is widely available to the public. Selections were made on both a geographic and a demographic basis, as well as diversity of ownership.

First, newspapers were divided into four groups based on daily circulation: Over 750,000; 300,001 to 750,000; 100,001 to 300,000, and 100,000 and under.

We included four newspapers over 750,000: USA Today, the Los Angeles Times, The New York Times, and The Washington Post. (The Wall Street Journal, which also falls in this category, was excluded as a specialty publication.)

Four newspapers were chosen in each of the remaining three categories. To ensure geographical diversity, each of the four newspapers within a circulation category was selected from a different geographic region of the U.S. Regions were defined according to the parameters established by the U.S. Census Bureau.1

The newspapers in circulation groups two through four were selected through the following process:

First, using the Editor and Publisher Yearbook, we created a list of every daily newspaper in the U.S. Within each category, newspapers were selected at random until all categories were filled. To be eligible for selection, a newspaper was required to a) have a Sunday section, b) have a daily sports section, c) have its stories indexed in a news database, to be available to coders, and d) not be a tabloid. Newspapers not meeting those criteria were skipped over. In addition, an effort was made to ensure diversity in ownership.

Circulation Group 1

Los Angeles Times

New York Times

USA Today

Washington Post

Circulation Group 2

Cleveland Plain Dealer

Dallas Morning News

Philadelphia Inquirer

Sacramento Bee

Circulation Group 3

Albuquerque Journal

Asbury Park Press

Kansas City Star

San Antonio Express-News

Circulation Group 4

Bloomington (Illinois) Pantagraph

Hanover (Pennsylvania) Evening Sun

McAllen (Texas) Monitor

Vacaville (California) Reporter

Newspaper Study Operative Dates, 2004

Random sampling was used to select a sample of individual days for the study. By choosing individual days rather than weeks, we hoped to provide a broader look at news coverage that more accurately represented the entire year. To account for variations related to the different days of the week, the 28 days that were sampled included 4 of each day of the week. Dates were chosen from January 1 to October 13, a span of 286 days. October 13 was made the cutoff date to allow time for coding. Omitted dates included those of the Olympics and the Republican and Democratic National Conventions.

The following dates were generated and make up the 2004 sample.


January- 13, 16, 23
February- 2, 13, 23rd, 29th
March- 8, 12, 13, 14, 19, 24
April- 8, 15
May- 1, 4, 20
June- 8, 9, 16
July- 19, 25
August- 10, 12
September- 4, 22, 26


Story Procurement, Selection, and Inclusion

Stories were procured via hard copies of daily publications, supplemented by a combination of electronic databases (DIALOG, FACTIVA, and NEXIS).

All stories with distinct bylines that appeared on a particular newspaper's front page (Page A1), on the first page of the Local/Metro section, or on the first page of the sports section were selected for analysis.

INTERNET NEWS SITES

To select the Internet news sites to be coded, the Nielsen/NetRatings top 20 news sites list was consulted to determine the most prominent sites. The list contained four basic types of sites: news aggregators,2 newspaper sites, network news sites, and cable news sites. Two sites were chosen for each of those categories. For aggregators, AOL and Yahoo were selected; they were the only two aggregators in top 20 list. For network news outlets, two sites were randomly chosen from among ABC, CBS, and MSNBC. (MSNBC appeared on both the network and cable lists because it is the news site for both NBC News and the MSNBC cable channel.) For cable sites, CNN and Fox News were chosen, since MSNBC had already been chosen from among the broadcast networks. For newspapers, the first site was chosen randomly from the four newspapers in Circulation Group 1, and the second was chosen randomly from the 12 newspapers in Groups 2 through 4. To be selected the newspaper had to have an active daily Web site. In addition, a local-TV news site was chosen. The market for local TV was chosen by randomly selecting one of the 15 markets from the newspaper sample and then randomly choosing among ABC, CBS, NBC, and Fox.

The following sites were included in the 2004 study:

ABC News (www.ABCNEWS.com)

AOL (news section front page)

Bloomington Pantagraph (www.pantagraph.com)

CBS 11 TV - Dallas (www.cbs11tv.com)

CNN (www.cnn.com)

Fox News (www.foxnews.com)

MSNBC (www.msnbc.com)

Washington Post (www.washingtonpost.com)

Yahoo! (news.yahoo.com)

Internet News Sites - Operative Dates 2004

The 2004 Internet study had two components. The first was a twenty-day sample that matched the dates of the newspaper sample, Mondays through Fridays. Weekends were not included for Internet, broadcast or cable sites. Again, the eligible dates ranged from January 1 to October 13, a period of 286 days.

The following dates were generated and constitute the 2003 Internet News Site sample.


January- 13, 16, 23
February- 2, 13, 23, 29
March- 8,* 12, 13, 14, 19,* 24
April- 8, 15
May- 1, 4, 20
June- 8, 9, 16*
July- 19, 25
August- 10,* 12*
September- 4, 22, 26
*Multiple Download Dates


In addition to the main sample, we conducted an additional study of five of those days in order to replicate the freshness variable studied in 2003. Among the 20-day sample, one day for each weekday was randomly selected.

Story Procurement, Selection, and Inclusion

For the main 20-day sample, each site was visited once a day. The download time rotated each day among four different hours: 9:00 A.M., 1:00 P.M., 5:00 P.M. and 9:00 P.M, ET. The order in which the sites were visited was also rotated for each capture time. Each download took approximately twenty minutes.

For the five-day sample, each site was visited four times on each day - 9:00 a.m. ET, 1:00 p.m. ET, 5:00 p.m. ET, and 9:00 p.m. ET - to download stories. The order in which the sites were visited was rotated for each capture time. Each download took approximately twenty minutes.

Each time, the following method was used to determine which stories to capture:

On the news home page of each of the sites, we identified featured stories. A story at the top of a page tied in to a graphic element - commonly a picture of an event or person - was counted as a featured story and captured for study. Multiple stories on the page relating to the same graphic element were also captured as featured stories. Pages with more than one graphic element were considered to have more than one featured story, and all such stories were studied.

After the featured stories, we included the next three most prominent stories without graphics starting from the top and moving down. Those stories were recorded as non-featured.

The following rules were put into place in selecting stories:

Text-Based Media Coding Procedures

General practice called for a coder to work through no more than seven days/issues from any newspaper outlet during a coding session. After completing up to seven days/issues from one publication, coders switched to another text-based-media outlet, and continued to code up to seven days/issues.

All coding personnel rotated through all circulation groups, publications/sites, with the exception of the designated control publications. A control publication was chosen in each category of text media. The designated control publication/date was initially handled by only one coder. That work was then over-sampled during intercoder reliability testing.

Working with a standardized codebook and coding rules, coders generally worked through each story in its entirety, beginning with the Inventory Variables - publication date, story length, placement, and origination. Next, they recorded the codes for each story's "content variables" - topics, recurring leads/big stories, newsmakers, tone, sourcing levels, and frame. Additional variables for Internet outlets measured links to graphics, audio, video, and photo galleries; and for the five multiple-download days, an additional variable measured story freshness.

Intercoder Reliability Testing for Text Media

Intercoder reliability measures the extent to which two coders, operating individually, reach the same coding decisions. The principal coding team for text media comprised four people who were trained as a group. One coder was designated as a general control coder, and worked off-site for the duration of the project. In addition, one newspaper was designated as a control source.

At the completion of the general coding process, each coder, working alone and without access to the initial coding decisions, re-coded publications originally completed by another coder. Intercoding tests were performed on 5% of all cases in connection with inventory variables, and agreement rates exceeded 98% for those variables. For the more difficult content variables, 20% of all publications/sites were re-coded, and intercoder agreement rates were as follows:

Trigger: 93%

Politics Trigger: 97%

Big Story: 96%

Campaign Trigger: 98%

Topic: 92%

Newsmaker: 90%

Tone: 96%

Source Transparency: 95%

Anonymous Sources: 98%

Data: 97%

Female Sources: 98%

Male Sources: 97%

Mix of Viewpoints: 92%

Stakeholders: 90%

Jnlst. Opinion/Speculation: 90%

Dominant Frame: 88%

Additional Frame: 87%

No significant differences were found to exist on a recurring basis.

II. BROADCAST NETWORK NEWS

The ability to make direct comparisons between newspaper and broadcast network findings was a project design goal, so the weekday sample dates for those two news categories are identical. Because of preemptions and schedule changes, weekend network news broadcasts do not always appear in all markets, so Saturday and Sunday broadcast network news programs were excluded from the study.

On a handful of the sample dates, special events pre-empted the evening newscasts. In such instances an alternate date for the same day of the week was selected at random. The final dates were as follows:


January- 13, 16, 23
February- 2, 23
March- 8, 12, 19, 24
April- 8, 15
May- 4, 20
June- 8, 9, 16
July- 19
August- 10, 12
September- 15, 22
June 9 commercial network newscasts were not used because the programming was preempted by the ceremonies remembering President Ronald Reagan. NewsHour was studied on this date. September 15 was used as a substitute for June 9 for the network newscasts.


BROADCAST NETWORK MORNING NEWS PROGRAMS

(7:00 a.m. - 7:59 a.m. Eastern Time Airings)

ABC Good Morning America

CBS The Early Show

NBC The Today Show

BROADCAST NETWORK EVENING NEWS PROGRAMS

(Full program as broadcast in New York market)

ABC World News Tonight

CBS Evening News

NBC Nightly News

PBS NewsHour

Program Procurement and Story Selection and Inclusion

The morning and evening broadcasts were procured through both transcripts and video tape. Transcripts were obtained through the Nexis electronic database. Videotaped programs were captured live in the New York City market by ADT Research. For the evening newscasts, that represents each day's 6:30 P.M. East Coast feed. PBS supplied the Project with tapes of the NewsHour.

In the mornings, the following content was analyzed: stories read by the newscaster in the half-hourly news blocks; feature and interview segments outside of the news blocks; banter between members of the anchor team whose import was other than to tease coming segments in that day's program or to promote the network's programming at some later time. One-fifth, 20%, of the sample was coded for teasers and promos and analyzed separately. Excluded from the analysis were the content of the weather blocks, local news inserts, commercials, and other content-free editorial matter such as logos, studio shots, openings and closings.

In the evenings the same rules applied, but because the content of the newscasts is less variegated, concerns about news blocks, banter, weather blocks and local news inserts were not applicable.

Broadcast Network Coding Procedures

Faculty and graduate students in the School of Journalism at Michigan State University conducted this part of the project. The two faculty members who supervised the project have more than 40 years of combined social-science experience in conducting such studies, and are two of the most published academic researchers in the field. Two students in the mass-media Ph.D. program at MSU, one a third-year student and the other a second-year student, coded most of the stories, assisted by a master's-degree graduate of the MSU Department of Communication. In addition, two current master's-degree students in the School of Journalism coded parts of the newscasts. Coding was done independently, working from the protocol, without consultation among the coders.

The coding protocol was provided by the Project for Excellence in Journalism. It called for coding 23 variables for each story in each designated newscast. Nineteen variables required substantive categorical judgments by coders.

In the course of the training, a decision was made to use only the two doctoral students to code the last four study variables: presence of multiple viewpoints, journalist's opinion, dominant story frame and presence of multiple frames. Those variables required more familiarity with journalistic ethics and standards, and the two doctoral students could bring extensive professional as well as academic training to the task. Each network was coded in turn.

Inter-Coder Reliability Testing for Broadcast Network News

A coder reliability assessment for each completed network was then conducted with a random sample of dates taken from those supplied by the State of the Media project. This usually consisted of one or two days used in the assessment from the total of days sampled, resulting in a sample of 5% to 10% of the total stories coded.

Percentages of agreement calculations were made to assess the coding for each of the variables requiring categorical choices among variable values.

Fifty-three stories from the evening newscasts and 69 from the morning newscasts (a total of 122 stories, or 7% of all stories) were used to test reliability. All of the variables used in the State of the Media analysis presented here achieved at least 90% inter-coder agreement, except story topic. The original story-topic coding scheme involved more than 300 subcategories, and reliability was below 80%. But when the coding was collapsed into the 12 categories used in this analysis, the inter-coder agreement reached 83% for all stories.

The content categories used in this analysis and their inter-coder agreement were: story prominence, 98%; story origin, 95%; big story, 95%; story topic, 83%; story tone, 91%; source transparency, 91%; anonymous sources, 94%; female sources, male sources 98%; multiple viewpoints, 93%, and presence of journalist opinion, 91%.

Table 1: Coder Reliability for Evening News Shows

 
ABC
(N=17)
CBS
(N=10)
NBC
(N=12)
PBS
(N=14)
Dateline
96%
100%
100%
100%
Story Prominence
92%
100%
95%
100%
Story Origin
92%
93%
100%
95%
Story Trigger
61%
60%
78%
86%
Party Trigger
92%
93%
100%
100%
Big Story
80%
93%
100%
100%
Campaign Topic
100%
87%
100%
100%
Story Topic*
76%
80%
67%
100%
Lead Newsmaker
72%
63%
63%
66%
Story Tone
92%
100%
74%
81%
Transparency
80%
73%
93%
100%
Anonymous Sources
88%
100%
89%
100%
Data Transparency
96%
100%
93%
100%
Female Source Number
96%
100%
100%
100%
Male Source Number
90%
90%
100%
95%
Multiple Viewpoints
94%
80%
100%
86%
Reporter Speculation
82%
80%
89%
93%
Dominant Frame
94%
70%
100%
93%
Multiple Frames
88%
100%
89%
86%
*Achieved reliability after collpasing topic categories.

Table 2: Coder Reliability for Morning News Shows

 
ABC
(N=22)
CBS
(N=17)
NBC
(N=30)
Dateline
97%
94%
100%
Story Prominence
93%
65%
100%
Story Origin
86%
100%
100%
Story Trigger
80%
90%
96%
Party Trigger
94%
100%
100%
Big Story
92%
100%
100%
Campaign Topic
100%
100%
100%
Story Topic*
73%
91%
95%
Lead Newsmaker
85%
84%
93%
Story Tone
88%
100%
95%
Transparency
88%
96%
100%
Anonymous Sources
83%
96%
100%
Data Transparency
97%
100%
100%
Female Source Number
95%
96%
100%
Male Source Number
85%
100%
100%
Multiple Viewpoints
100%
88%
100%
Reporter Speculation
100%
88%
100%
Dominant Frame
100%
88%
94%
Multiple Frames
91%
94%
94%
*Achieved reliability after collpasing topic categories.

Table 3: Coder Reliability Summary for Evening News, Morning News and All News Programs

 
Evening News (N=53)
Morning News (N=69)
All (N=122)
Dateline
99%
97%
98%
Story Prominence
97%
86%
91.5%
Story Origin
95%
95%
95%
Story Trigger
71%
89%
80%
Party Trigger
96%
98%
97%
Big Story
93%
97%
95%
Campaign Topic
97%
100%
98.5%
Story Topic*
81%
86%
83.4%
Lead Newsmaker
66%
87%
76.5%
Story Tone
87%
94%
90.5%
Transparency
87%
95%
91%
Anonymous Sources
94%
93%
93.5%
Data Transparency
97%
99%
98%
Female Source Number
99%
97%
98%
Male Source Number
94%
95%
94.5%
Multiple Viewpoints
90%
96%
93%
Reporter Speculation
86%
96%
91%
Dominant Frame
89%
94%
91.5%
Multiple Frames
91%
93%
92%
*Achieved reliability after collpasing topic categories.


III. CABLE NEWS

Cable News Programming - Outlet Selection and Operative Dates 2004

As with the online sample, the 2004 Cable study had two components. The first was a twenty-day sample that matched the dates of the newspaper sample on Mondays through Fridays. Weekends were not included for the Internet, broadcast or cable. Again, the eligible dates ranged from January 1 to October 13, a period of 286 days. On a handful of the sample dates, special events pre-empted the evening newscasts. In such instances an alternate date for the same day of the week was selected at random.

The following dates were generated and make up the 2004 cable news sample:


January- 13, 16, 23
February- 2,* 23
March- 8, 12, 19,* 24h
April- 15
May- 4,* 20*
June- 8, 9, 16*
July- 19
August- 5, 10, 12
September- 22

* Indicates cable station programming was taped continuously from 7 a.m. to 11 p.m.

In addition to the main sample, we also conducted an additional study of five of these days to replicate the freshness variable studied in 2003. From the 20-day sample, one day for each weekday was randomly selected. These days were:

February 2

March 19

May 4

May 20

June 16

Story Procurement and Inclusion

To assess the nature of the 24-hour news cycle as presented on cable news programming, CNN, Fox News, and MSNBC were selected because they were the three most-viewed cable news channels in 2003.

For the twenty-day sample, we selected three program types to study at each network: Daytime programming, the closest thing to a traditional newscast, and the highest-rated prime time talk show. The following programs were captured and analyzed:

DAYTIME PROGRAMMING

The 11-to-12 o'clock hour for each network

NEWSCAST/NEWS DIGEST PROGRAMS

CNN's NewsNight with Aaron Brown

FOX's Special Report with Brit Hume

MSNBC's Countdown with Keith Olbermann

PRIME-TIME TALK PROGRAMS

CNN's Larry King Live

FOX's O'Reilly Factor

MSNBC's Hardball with Chris Matthews

For the five-day sample, all programming was captured and coded from 7 a.m. (the beginning of the morning shows) until 11 p.m. (the end of prime time), a 16-hour stretch of programming. This resulted in some 240 hours of programming.

All cable programming was procured through both videotape and transcripts, although transcripts were not available for the Fox News programming at the 11:00 a.m. hour. Transcripts were obtained through the Nexis electronic database. Videotaped programs were captured live in the Washington, D.C. market. In some instances tapes were provided to us by VMS, a commercial third-party monitoring service.

Cable News Coding Procedures

The cable news coding was conducted by faculty members, graduate students, and research staff people affiliated with the Institute for Communication and Information Research at the University of Alabama. Six coders were involved throughout the coding process. All coders worked independently, without consulting one another regarding specific coding decisions.

Cable News Inter-coder Reliability Testing

As noted, three program types were studied for each of the three cable news networks. To assess reliability within and across program types, we randomly selected six of the 60 hours of daytime programming, six of the 60 hours of news-digest programming, and six of the 60 hours of prime-time talk programming. In other words, the reliability sample was stratified by program type.

The reliability sample was also stratified by network. Within the six hours for each program type we included two hours from each of the three networks.

This 18-hour sample represents 10% of the 180 hours of programming included in the study; the 6-hour sample for each program type represents 10% of the 60 hours dedicated to each of the three program types.

Percentages of agreement calculations were made to assess the coding for each of the variables requiring categorical choices among variable values.

All of the variables used in the State of the Media analysis presented here achieved at least 88% inter-coder agreement, ranging from a low of 88% for presence of journalist opinion to a high of 98% for story prominence.

IV. STATISTICAL ANALYSES IN THE CONTENT ANALYSIS

For each media subcategory - newspapers, Internet news sites, broadcast network news and cable network news - separate datasets were created, and separate tabulations were constructed. Whenever comparisons were made across content categories within a single medium, the chi square statistic was employed to determine whether the comparisons were based on statistically different observations. Data from all the channels were aggregated into a single file so that cross-media comparisons could be made. Again, whenever a cross-media comparison is referred to as showing different patterns, that statistical difference in those patterns was determined with the chi square statistic.

For much of this report, the individual news story is the unit of analysis. There are, however, selected variables where it was more informative to present analysis through a measurement of the time or words devoted to particular topics or recurring leads.

Within each universe (cable, newspapers, etc.), each case in the applicable SPSS dataset represents one story. Length is one of the measurements recorded for each case. (Note: for network and cable, this number represents seconds; for newspapers and news magazines, this number represents word count; for Internet, no volumetric analysis was applied.)

To create the volumetric tables, each case was selected, and the number recorded in the Length variable was designated as a weight. Then, that individual weight was applied to each individual case. The resulting weighted dataset was used in the production of volumetric tables for selected variables.

Statistical Analyses

For most comparisons of how content and structure of the news vary as a function of which medium is being examined, chi square analyses were used. Chi square is a non-parametric statistic that examines the relationship between nominal variables, that is, variables that are identified by "name" and are not on a numeric scale (e.g., CNN, MSNBC, and Fox News are nominal variables.) As noted by Riffe, Lacy, & Fico3 (1998), pp. 167-168:

"The chi-square test of statistical significance is based on the assumption that the randomly sampled data appropriately described, within sampling error, the population's proportions of cases falling into the categorical values of the variables being tested.

Chi-square starts with the assumption that there is in the population only random association between the two variables, and that any sample finding to the contrary is merely a sampling artifact.

For each cell in a table linking the two variables, chi-square calculates the theoretical expected proportions based in a posited null relationship. The empirically obtained data are then compared cell by cell with the expected null-relationship proportions. Specifically the absolute value of the differences between the observed and expected values in each cell goes into the computation of the chi-square statistic. Therefore, the chi-square statistic is large when the differences between the empirical and theoretical cell frequencies are large, and small when the empirically obtained data more closely resemble the pattern of the null relationship.

This chi-square static has known values that permit a researcher to reject the null hypothesis (no relationship between the variables) at the standard 95% and 99% levels of probability."