Technical Deep Dive: Building an EV Media Coverage Analysis Framework

Technical Deep Dive: Building an EV Media Coverage Analysis Framework
NewsDataHub Staff

Posted on Jan 10, 2025 | 1296 words | ~7mins

We analyzed over 11,000 articles about electric vehicles from 400 different news sources, focusing on five key models: the Tesla Model 3 and Model Y, Chevrolet Bolt EV, Ford F-150 Lightning, and Tesla Cybertruck. This article dives into the implementation details of our analysis framework.

Solution Overview

Data Collection: We fetch articles via NewsDataHub API in monthly batches of 2,000 articles per month, using a query to capture mentions of specific EV models.

Content Processing: A vehicle taxonomy system identifies and categorizes mentions of different EV models, handling various naming conventions (e.g., “F-150 Lightning” vs. “Ford Lightning”). Vehicle mentions are tracked on a monthly basis.

Analysis & Correlation: The raw mention counts are then converted into attention share percentages.

Visualization: Using Plotly, we created interactive visualizations to track coverage trends and attention share over time. This includes line charts for temporal trends, pie charts for relative coverage distribution, and heatmaps for event correlation analysis.

This framework processed over 11,000 articles, revealing the coverage patterns and trends we’ll explore in this article and that are more fully explored here: https://blog.newsdatahub.com/posts/ev-news-coverage-analysis

Notable Trends And Patterns

Coverage Distribution Evolution:

In 2020, Tesla models dominated media coverage with Model 3 averaging 38.4% and Model Y averaging 31.2% of total coverage for that year.

The F-150 Lightning captured an average of 40.41% of media coverage throughout 2022.
The Cybertruck garnered an average of 38.33% of total media coverage during 2024.

Market Coverage Dynamics:

The Tesla Model 3 maintained strong media presence, consistently capturing over 30% of coverage through 2023.
Traditional automakers gained significant attention, particularly with the F-150 Lightning.
The Bolt EV saw a dramatic drop in coverage—falling from 9.79% in 2020 to 2.15% in 2024.

Media coverage varied noticeably across all models, with major events and product launches consistently triggering spikes in attention.

Technical Implementation and Architecture

Building A Keyword Detector

We created a structured classification system or “vehicle dictionary” that allows us to identify vehicle mentions in articles.

self.vehicle_taxonomy = {
    'model_y': ['tesla model y', 'model y', 'tesla y'],
    'model_3': ['tesla model 3', 'model 3', 'tesla 3', 'model 3 highland'],
    'bolt_ev': ['chevrolet bolt ev', 'chevy bolt ev', 'bolt ev'],
    'f150_lightning': ['f-150 lightning', 'ford lightning', 'f150 lightning'],
    'cybertruck': ['tesla cybertruck', 'cyber truck']
}

We chose this approach for two key reasons:

The structure enables easy addition of new vehicle variants or models without modifying the core processing logic.
Given the small size of the vehicle taxonomy, a simple dictionary is more efficient than a complex database, providing minimal overhead and easy maintenance.

Data Processing Implementation

Our mention extraction process scans through thousands of articles and tallies each vehicle reference.

def extract_technology_mentions(self, articles):
    mentions = defaultdict(lambda: defaultdict(int))

    for article in articles:
        month = article['pub_date'][:7]
        text = f"{article['title']} {article['description']}".lower()

        for category, keywords in self.vehicle_taxonomy.items():
            if any(keyword in text for keyword in keywords):
                mentions[month][category] += 1

This implementation, although very simple, handles several complex challenges.

The defaultdict nested structure eliminates the need for explicit initialization checks, reducing cognitive complexity and potential edge cases.
The any() function short-circuits on the first match, avoiding unnecessary iterations.
Articles are processed one at a time, maintaining a constant memory footprint regardless of the total dataset size.
The mentions dictionary grows linearly with the number of unique month-vehicle combinations, not with the number of articles.

Our current implementation fetches data for all vehicles at the same time; however, we could have fetched and processed data for each vehicle one at a time. This has its pros and cons:

All Vehicles at Once

Pros:

Processing all vehicles in a single query requires fewer API calls (80% reduction for 5 vehicles), ensures consistent data sampling across vehicles, and simplifies rate limiting management.

Cons:

If an API call fails when fetching data for all vehicles at once, it would result in missing data for every vehicle being tracked.

Individual Vehicle Queries

Pros:

Individual queries allow better control over search parameters, prevent system-wide failures, and make it easy to modify the vehicle list.

Cons:

Requires significantly more API calls, potentially using up the API quota faster than combined queries.

Media Attention Calculation

The system calculates the relative media attention each vehicle receives by converting raw mention counts into percentages.

def calculate_attention_shares(self, mentions):
    attention_shares = {}

    for month, categories in mentions.items():
        total_mentions = sum(categories.values())

        if total_mentions > 0:
            shares = {
                category: (count / total_mentions * 100)
                for category, count in categories.items()
            }
            attention_shares[month] = shares

This approach normalizes the data to account for varying news volumes across different time periods.

Data Quality Assurance:

Zero-mention months are handled gracefully through the total_mentions > 0 check, which prevents division by zero errors when there are no mentions in a given month
Missing vehicles in a given month automatically receive 0% share
It ensures data quality by handling edge cases where no vehicles were mentioned, avoiding the need to process months with no meaningful data

Statistical Considerations:

The normalization against total monthly mentions accounts for varying news volumes. This approach reveals relative attention shifts even when absolute numbers fluctuate.

For instance, if Tesla gets 50 mentions in two different months, but one month has 100 total articles while the other has 500, the percentage analysis reveals a stark difference in relative coverage - 50% versus 10%.

Our standardized methodology enables meaningful comparisons across time periods and reveals media attention trends. However, numbers and percentages tell only part of the story. To add depth to these statistical patterns, we use a timeline of events to understand how major industry events influence spikes in media coverage.

Event Context And Correlation Analysis

We maintain a timeline of significant industry events to help understand changes in media coverage. This timeline includes major milestones like Tesla Model Y’s production start and COVID-19’s impact on the automotive industry. To see the full timeline please refer to the full analysis in this NewsDataHub Blog article.

While this timeline is not exhaustive, it captures major milestones that could potentially influence coverage patterns:

self.significant_events = {
    '2020-01': 'Tesla Model Y begins volume production',
    '2020-04': 'COVID-19 pandemic impacts automotive industry',
    # ... additional events
}

This timeline serves several purposes:

Records major events that might influence media coverage
Helps explain sudden changes in how much attention different vehicles receive
Provides context for interpreting coverage patterns
Connects media coverage to real-world industry developments

What Did We Learn from Analyzing the Data?

Our analysis suggests that media coverage in the electric vehicle space is shaped by a complex interplay of factors. Beyond just product launches and technical specifications, we hypothesize that media narratives are influenced by company positioning, market dynamics, and broader industry trends. The data also points to the significant role of high-profile industry figures in driving media attention. Celebrity status and public visibility of tech leaders appear to amplify coverage, suggesting that personal brand and corporate visibility are deeply intertwined in today’s EV media landscape.

Bringing It All Together

Our technical framework for analyzing EV media coverage showcases how structured data analysis and contextual event tracking work together powerfully. Our implementation choices allowed us to efficiently process large volumes of media data while maintaining both accuracy and scalability.

This architecture can serve as a blueprint for similar media analysis projects. Key takeaways from our implementation include:

Flexible data structures that make system expansion straightforward
Data normalization techniques that enable meaningful comparisons
The effective combination of numerical data and contextual event analysis

This article is derived from our comprehensive analysis in our article Making Headlines in the Electric Vehicle World: A Data-Driven Analysis of Industry News Coverage.