AI-Powered Stock News Sentiment Analyzer
Full-stack AI pipeline that scrapes real-time S&P 500 press releases, runs them through multi-model GPT sentiment analysis, and produces actionable buy/sell signals on a color-coded dashboard.
Timeline: ~4 months — Oct 2023 to Jan 2024
Key Results
500+ companies
S&P 500 Coverage
Full S&P 500 index loaded from curated JSON dataset for symbol selection
3 models
AI Models Integrated
GPT-4o (sentiment), GPT-4 (filtering), GPT-3.5-turbo (decisions) — plus experimental Gemini support
Every 1–2 min
Automation Frequency
bridge:run polls every minute, scrap:engine processes every two minutes with overlap protection
3 stages
Pipeline Stages
Scrape → Filter (relevance check) → Analyze (sentiment scoring) — fully automated per stock symbol
The Problem
Retail investors and traders monitoring S&P 500 stocks lacked a fast, automated way to consume breaking press releases and understand their likely impact on stock prices — manually reading dozens of articles per symbol per day was slow, inconsistent, and error-prone.
- Traders wasted hours manually browsing Yahoo Finance press releases across multiple stock symbols every day.
- Human sentiment interpretation of news is subjective and inconsistent — the same article might get different readings on different days.
- No existing tool combined real-time press release scraping with AI-driven sentiment scoring into a single actionable dashboard.
- Irrelevant articles (commentary, analysis pieces, tangentially related news) polluted the signal, requiring manual filtering before analysis.
The Solution
Built an end-to-end pipeline that uses headless browser automation to scrape Yahoo Finance press releases, filters them for relevance via GPT-4, sends each article through GPT-4o for structured sentiment analysis (positive/negative/neutral), and presents results in a color-coded web dashboard with buy/sell signals.
- 1Built a headless browser scraping engine using Laravel Dusk (Selenium/ChromeDriver) to navigate Yahoo Finance press release pages for each stock symbol, extracting article titles, links, publish timestamps, and current stock prices.
- 2Integrated OpenAI GPT-4o API via a dedicated AiService with custom system prompts — the AI returns structured JSON with impact (+1/0/−1), illustration, and deep_thinking fields for each article.
- 3Added a GPT-4-powered article relevance filter to pre-screen articles, discarding commentary and tangentially related content to ensure only breaking news passes through to sentiment analysis.
- 4Designed a relational schema (reports → stocks → cards) with status tracking and a draft table to deduplicate previously processed article links.
- 5Created two Artisan commands: bridge:run (syncs unprocessed reports from production every minute) and scrap:engine (processes one stock per run every two minutes), both scheduled with overlap protection.
- 6Implemented a split local-processing / remote-serving architecture — scraping and AI analysis run locally, then results are pushed to the production server via a webhook endpoint (api/webhook/receive-data).
- 7Built a Blade-based dashboard with Select2 multi-select for S&P 500 company selection, a reports table showing processing status, and a news report view with color-coded impact cards (green/orange/red).
Architecture Decisions
Key technical decisions made during the project and the reasoning behind them.
Local Processing + Production Sync via Webhooks
Reasoning
The scraping engine uses headless Chrome (Laravel Dusk), which is resource-intensive and unreliable on shared hosting. By running scraping and AI analysis locally and syncing results to production via webhooks, the system avoids server limitations while keeping the public-facing app lightweight.
Outcome
Enabled heavy browser automation and AI API calls without production server constraints. The bridge:run command polls for new reports every minute and scrap:engine processes them locally.
Multi-Model GPT Pipeline (GPT-4o + GPT-4 + GPT-3.5-turbo)
Reasoning
Different tasks required different accuracy/cost tradeoffs. GPT-4o provided the best structured output for sentiment analysis, GPT-4 was accurate enough for binary relevance filtering, and GPT-3.5-turbo handled lighter decision tasks cost-effectively.
Outcome
AI returns structured impact scores that drive buy/sell signals and color-coded cards. Custom prompts are stored in a database prompts table, making them configurable per report without code changes.
Laravel Dusk over HTTP-Based Scraping
Reasoning
Yahoo Finance press release pages render content dynamically via JavaScript. Traditional HTTP + DOM parsing couldn't access JS-rendered content. Laravel Dusk with headless Chrome handles SPAs and dynamic content natively.
Outcome
Reliable extraction of press release titles, links, datetime stamps, and stock prices from JavaScript-rendered Yahoo Finance pages.
Configurable Prompt System via Database
Reasoning
Different analysis contexts require different AI prompts. Rather than hardcoding prompts, a prompts table stores primary and secondary prompts linked to reports via foreign key, allowing per-report prompt customization.
Outcome
Prompts can be updated in the database without redeployment — supports A/B testing different analysis strategies per stock batch.
The Tech Stack
PHP 8.0+ — Artisan commands for scheduled scraping, Eloquent ORM for report/stock/prompt models, Blade templating, Laravel Sanctum API auth, task scheduling with overlap protection
Multi-model AI pipeline — GPT-4o for primary sentiment analysis returning structured JSON, GPT-4 for article relevance filtering, GPT-3.5-turbo for lighter decision tasks
Headless browser automation (Selenium/ChromeDriver) for scraping Yahoo Finance press release pages and extracting JS-rendered stock data
ChromeDriver-based headless browsing for navigating dynamically rendered Yahoo Finance pages
Relational database — reports, stocks, prompts, and draft tables with foreign key relationships and cascading deletes
Server-side templating for the dashboard — Select2 multi-select dropdowns, color-coded impact cards, reports table
Responsive UI framework for the dashboard layout, report tables, and news card components
Financial data API integration for fetching portfolios, stock quotes, watchlists, symbol searches, and news articles with date range filtering
The Impact
A production AI-powered stock analysis pipeline covering all 500+ S&P 500 companies with a 3-stage automated pipeline (scrape → filter → analyze) running every 1–2 minutes.
Related Reading
Deep dives and comparisons related to the technologies used in this project.
Scaling Laravel APIs: From 500 to 10,000 Concurrent Users
Practical techniques for scaling Laravel beyond its reputation — queue-based processing, Redis caching strategies, and database optimization patterns from the Avidnote project.
Integrating Plaid API with Django for Secure Financial Data Access
A step-by-step guide to integrating Plaid's financial data API with a Django REST Framework backend — covering Link token exchange, webhook handling, and secure credential storage.
Laravel vs. Django for Backend APIs — A Developer's Honest Comparison
Senior backend developer compares Laravel (PHP) and Django (Python) for APIs. Real benchmarks, development speed, hosting costs, and when to use each framework.
Hiring a Developer on Upwork vs. Direct Freelance: Which Is Better?
Compare hiring a developer on Upwork vs. hiring directly. We break down cost, risk, communication, and quality so you can choose the best path for your project.
Related Projects
Explore similar case studies with overlapping technologies and challenges.