Changelog

0.8.0 — 2026-04-20 (Manual Company Entry)

Standalone company creation and safer manual entity linking.

Added

Add Company form: New card on the Companies list page lets users create a standalone company with an optional type. Duplicate names redirect to the existing record; new entries redirect to the freshly created company's detail page.
POST /companies/add endpoint backing the new form.

Fixed

Manual company/person linking on project pages: The "Add Company" and "Add Person" forms on project detail previously ran typed names through the fuzzy entity resolver (resolve_company), which could silently map a new name to an unrelated existing company via short substring matches (partial_ratio × 0.9 crossing the 90% threshold). Manual entry now uses exact-name lookup, falling back to insert-new — the resolver remains in place for LLM-extracted names where fuzzy matching is desired.

0.7.0 — 2026-04-16 (Article Management)

Manual article import, article-project linking, video content detection, and pagination.

Added

Manual URL import: Paste an article URL on the Articles page to fetch, extract, and analyze it with Sonnet — skips keyword filter and Haiku classification (user has already vetted relevance). Shows project creation/update summary on completion.
Article-project linking: Search and link articles to existing projects from the article detail page. Includes autocomplete project search and unlink controls.
Video content detection: Detects YouTube/Vimeo embeds and video tags during article processing. New content_type field on articles (article, video, mixed). Filterable on the articles page.
Articles pagination: Replaced hardcoded 100-article limit with paginated browsing (50 per page) with prev/next navigation.
Articles relevance filter: Toggle between "All Articles" and "Linked to Projects" on the articles page.
Project search API: GET /articles/api/search-projects?q=... endpoint for autocomplete lookups.

Changed

Extraction refactor: Shared _process_extraction() helper in articles routes — used by both manual import and re-analyze, eliminating code duplication.
Pipeline content_type: RSS collector and pipeline runner now detect and store content_type for all collected articles.

Schema

Migration 003: content_type TEXT DEFAULT 'article' column on articles table.

Housekeeping

Added .pytest_cache/ and data/backups/ to .gitignore
Removed stale pip install artifacts (=22.0, =3.10)
Archived completed Session 4 plan to sessions/archive/

0.6.0 — 2026-03-20 (Branding & Public Pages)

Parjana branding, GaiaOps attribution, public homepage, and changelog page.

Added

Parjana favicon: Downloaded from parjanaengineering.com, referenced in all pages via base.html
Parjana logo: Icon in dashboard nav bar (24px) and login page header (64px)
GaiaOps footer: "Created by GaiaOps — Multiply Your Environmental Impact" on all pages with link to gaiaops.io
Public homepage: Anonymous visitors to / see a landing page with SEO meta tags (title, description), Parjana logo, project description, and login link. Authenticated users still see the dashboard.
Changelog page: /changelog route (public, no auth) renders CHANGELOG.md to HTML via the markdown library. Linked from footer.
SEO controls: <meta name="robots" content="noindex"> on all dashboard pages. Public homepage is the only indexed page.

Changed

/ route: No longer requires authentication — shows public homepage for anonymous visitors, dashboard for authenticated users
Context processor: Skips DB query for nav counts on anonymous requests

Dependencies

Added markdown>=3.3 for changelog rendering

Assets

dashboard/static/img/favicon.png — Parjana stacked layers icon
dashboard/static/img/parjana-logo-icon.jpg — Parjana icon (standalone)
dashboard/static/img/parjana-logo-full.webp — Parjana full logo with text

0.5.0 — 2026-03-20 (Production Hardening)

Production reliability improvements: gunicorn WSGI server, daily database backups.

Added

Gunicorn WSGI server: Replaced Flask dev server with gunicorn (--workers 1 --timeout 120). Eliminates dev server warning in production.
Daily database backups: APScheduler job at 04:00 UTC copies SQLite DB to /data/backups/solarscout-YYYY-MM-DD.db with 7-day retention and auto-pruning.
Pre-deploy testing checklist: Documented in session notes.

Changed

App initialization: run_dashboard.py refactored so DB init, app creation, and scheduler start happen at module load — works for both gunicorn import and direct execution.
Scheduler: Now runs two jobs — daily pipeline (12:00 UTC) and daily backup (04:00 UTC).

0.4.0 — 2026-03-20 (Production Deployment)

First production deployment to Railway with custom domain.

Added

Production hosting: Deployed to Railway Hobby Plan with SQLite on persistent volume
Custom domain: Live at parjanasolarscout.app with auto-provisioned SSL via Railway/Let's Encrypt
Run Pipeline button: Dashboard home page has a "Run Pipeline Now" button with status polling, concurrent-run prevention, and auto-refresh on completion
APScheduler integration: Daily pipeline runs at 12:00 UTC via in-process background scheduler — no separate cron service needed
Pipeline HTTP endpoint: /run-pipeline endpoint (POST, auth required) for triggering pipeline runs programmatically
Pipeline status endpoint: /pipeline-status returns whether a pipeline run is currently in progress
BACKLOG.md: Backlog file for tracking planned features and known issues across sessions

Changed

Default Sonnet model: Updated from claude-sonnet-4-5-20250514 to claude-sonnet-4-6

Infrastructure

GitHub repo: ross-gaiaops852/solar-scout (private)
Railway web service auto-deploys from main branch
Volume mounted at /data for SQLite persistence across deploys
Environment variables managed via Railway dashboard (not .env in production)

First Production Pipeline Run

305 articles collected from 7 feeds
142 passed keyword filter, 54 passed Haiku classification
48 new projects created, 51 companies identified, 11 people tracked
Estimated API cost: $1.41
Duration: ~31 minutes

0.3.0 — 2026-03-19 (Dashboard v2)

Dashboard usability improvements and activity tracking.

Added

Activity log: All manual edits to project fields and outreach status changes are logged chronologically. Displayed at the bottom of each project detail page.
People page: New top-level "People" tab — browse all tracked people with their company and project associations, with search/filter.
Review queue redesign: Side-by-side comparison of existing record vs incoming data. Differing fields highlighted. Links to view full existing record before deciding.

Improved

Key People company links: Company names in project Key People section now hyperlink to the company record.
Company People → Project links: Key People on company detail pages now show which project they're associated with, hyperlinked.
Review queue context: Shows structured field comparison instead of raw JSON blob.

Fixed

Intelligence extractor crash: .format() on extraction prompt was interpreting JSON schema braces as template variables (KeyError). Switched to .replace().
Article text lost on re-analysis failure: Text updates and API calls shared a transaction — if re-analysis failed, pasted text was rolled back. Text is now committed before attempting extraction.

Schema

Migration 002: activity_log table with project_id, field_name, old/new values, timestamp.

0.1.0 — 2026-03-03 (MVP Alpha)

First working end-to-end pipeline run.

Added

Data collection: RSS collector pulling from 7 feeds (Solar Power World, PV Magazine, Utility Dive, PR Newswire, GlobeNewsWire, 2 Google News queries)
Keyword filter: Two-tier regex matching (solar terms + development signals) eliminates irrelevant articles before any API calls
Haiku relevance classifier: Sends filtered articles to Claude Haiku for relevance scoring (50MW+ US solar projects)
Sonnet intelligence extractor: Full structured extraction — project details, companies, people, timelines, stormwater relevance, outreach urgency
Entity resolution: Fuzzy matching for projects (85% threshold) and companies (90% threshold) with ambiguous matches (70-85%) routed to review queue
Pipeline runner: Orchestrates collect → filter → classify → extract → resolve → store with per-run stats logging
Dashboard: Flask app with login, pipeline status, projects list (filterable/sortable), project detail with stormwater design window, companies, articles, review queue
Health endpoint: /health returns JSON stats without auth (Railway health checks)
Database: SQLite schema with 9 tables, indexes on common query columns
Tests: Keyword filter and entity resolver test suites

Fixed

Relevance classifier prompt crash — .format() was interpreting JSON braces as template variables
Entity resolver crash — sqlite3.Row objects don't support .get(); converted to dicts at resolver boundary
Entity resolver developer matching — _developer field was referenced but never populated; now queries project_companies table

Improved

Full-text fetching parallelized across domains (ThreadPoolExecutor) — previously sequential with 2s delay per article