Notes on Humphreys and Wang (2018) – Automated Text Analysis for Consumer Research

Main Topic or Phenomenon

This paper addresses the growing availability of digital text data generated by consumers and provides a comprehensive methodology for incorporating automated text analysis into consumer research. The authors focus on developing systematic approaches to analyze consumer-generated content like reviews, social media posts, blogs, and online discussions to understand consumer attitudes, interactions, and culture.

Theoretical Construct and Framework

The paper is built on linguistic theory as the foundation for understanding consumer thought and behavior through language. The core framework integrates three linguistic dimensions:

Semantics: The study of word meaning and explicit linguistic content. Used to measure consumer attention, emotion, and conceptual associations.

Pragmatics: The interaction between linguistic content and contextual factors like speaker-hearer relationships and social dynamics. Used to study interpersonal dynamics, status, power, and social influence.

Syntax: The grammatical structure and order of linguistic elements. Used to examine processing complexity, cognitive effort, and persuasiveness.

The authors develop a six-stage roadmap for automated text analysis: (1) developing research questions, (2) identifying constructs, (3) collecting data, (4) operationalizing constructs, (5) interpreting results, and (6) validating results.

Key Findings

image-20250609193740216
  • Discovery capability: Automated text analysis can reveal patterns in language that humans cannot detect unaided, leading to discoveries about systematic relationships between constructs

  • Measurement precision: Computers can execute rules impartially to measure changes over time, compare between groups, and aggregate large amounts of text more consistently than human coders

  • Ecological validity: Text analysis provides external validity by studying naturally occurring consumer discourse, complementing laboratory experiments

  • Multi-level analysis: The methodology can span individual, dyadic, group, and cultural levels of analysis

  • Methodological guidelines: Specific recommendations for dictionary development, sampling strategies, validation procedures, and statistical analysis of sparse textual data

Boundary Conditions and Moderators

When text analysis is inappropriate:

  • Research requiring causal inference through controlled experimentation
  • Studies of behavioral or unarticulated phenomena (e.g., response time, skin conductance)
  • Research requiring observation of actual consumer practices versus discourse
  • Analysis requiring detection of complex meanings like sarcasm or nuanced rhetorical strategies

Data quality moderators:

  • Sample size requirements (minimum 30 units, at least 1,000 words per person for personality traits)
  • Base frequency of dictionary keywords affects reliability
  • Language complexity (character-based languages like Chinese require additional preprocessing)

Validity constraints:

  • Selection bias in internet data (e.g., Twitter users are younger and more urban)
  • Keyword search bias may miss important data due to semantic framing differences
  • Cultural products may not directly reflect individual attitudes

Building on Previous Work

The paper extends psychological text analysis methods (like LIWC) by:

  • Integrating linguistic theory with consumer behavior constructs
  • Providing comprehensive methodological guidance for the entire research process
  • Addressing unique challenges in consumer research contexts

The paper challenges the fragmented approach in existing literature by:

  • Criticizing the lack of integration between linguistic theory and consumer research applications
  • Highlighting the absence of standardized methods and reporting procedures
  • Arguing against the limited scope of most existing text analysis studies

The work builds on computational linguistics and content analysis traditions while adapting them specifically for consumer research questions and contexts.

Major Theoretical Contribution

The primary theoretical contribution is the integration of linguistic theory with consumer behavior constructs. The authors demonstrate how semantic, pragmatic, and syntactic dimensions of language can be systematically used to study four key areas in consumer research:

  1. Attention (through semantics) - measuring consumer focus, temporal orientation, and emotional states
  2. Processing (through syntax) - examining cognitive complexity, decision strategies, and persuasion mechanisms
  3. Interpersonal dynamics (through pragmatics) - studying status, power, influence, and social relationships
  4. Group and cultural characteristics (through combined dimensions) - analyzing collective attention, cultural trends, and social movements

This framework provides a theoretical bridge between language and consumer psychology that was previously missing in the field.

Major Managerial Implications

Brand positioning: Companies can use semantic analysis to map consumer perceptions and identify positioning gaps by analyzing how brands are discussed in relation to key attributes.

Customer insight discovery: Text analysis can reveal previously unknown patterns in customer feedback, such as discovering side effects in drug reviews that weren’t captured in clinical trials.

Communication strategy: Understanding how linguistic style affects persuasiveness can inform marketing message development, particularly regarding active vs. passive voice and syntactic complexity.

Trend identification: Automated analysis of social media and review data can help companies track emerging consumer concerns and cultural shifts in real-time.

Product development: Analysis of expert vs. consumer reviews can reveal disconnects between what companies emphasize and what consumers actually value.

Unexplored Theoretical Factors

Individual difference moderators:

  • Consumer expertise level and how it affects linguistic expression
  • Cultural background and language proficiency effects on text analysis validity
  • Personality traits beyond what current dictionaries capture

Contextual moderators:

  • Platform-specific communication norms and their impact on linguistic patterns
  • Temporal context effects (time of day, seasonality) on language use
  • Audience size and composition effects on linguistic style

Methodological boundary conditions:

  • Cross-cultural validity of English-based dictionaries and methods
  • Effectiveness across different product categories and consumption contexts
  • Integration with other data types (visual, behavioral) for richer insights

Dynamic factors:

  • How linguistic patterns evolve as consumers gain experience with products/brands
  • The role of social influence and viral effects on language spread
  • Feedback loops between company communications and consumer language adoption

Reference

Humphreys, Ashlee and Rebecca Jen-Hui Wang (2018), “Automated Text Analysis for Consumer Research,” Journal of Consumer Research, 44 (6), 1274–1306.

Chen Xing
Chen Xing
Founder & Data Scientist

Enjoy Life & Enjoy Work!