Changelog

0.14.0 — 2026-06-26 (need Layer)

Third session of the M2 Sourcebed schema-alignment arc. Adds the derived need layer: a project at a given lifecycle stage has needs derived from the project alone, before any supplier is attached — the engine of the future proactive-sales motion and the layer that makes "stormwater" a first-class derived opportunity rather than a column on the project. Today there is exactly one need type (stormwater_infiltration, Parjana's), which is why standing it up now — nothing to migrate — is cheap. Additive and zero-LLM. projects.stormwater_design_window is kept and parallel-read until the contract session (s16).

Added

  • need table: need_id, project_id, need_type, relevant_at_stage, design_window, status (default unmatched), keyed by a unique (project_id, need_type) constraint, plus indexes on project_id and status. design_window is a supplier-neutral need attribute, so it re-homes here from projects.stormwater_design_window (not onto the Parjana match in s14).
  • processors/need_generator.py — the canonical zero-LLM lookup project_type × lifecycle_stage → [need_type]. Today: stormwater_infiltration, hot across permitting / verification / construction; generated once a project reaches its earliest hot (activating) stage and persisting thereafter. The vocabulary grows from real match outcomes — no speculative need types are pre-populated. generate_for_project() is idempotent and never calls resolve_project; it preserves status on re-runs (so a match set in s14 is never reset) while refreshing the re-homed design_window.
  • Forward-write wiring: the pipeline generates/refreshes needs after entity resolution (new Stage 5), and a manual stage advance on the project detail page derives the need immediately. New models.get_project_needs() and models.upsert_need() helpers.
  • Read-only Needs readout on the project detail page — need type, relevant stage, status badge, and design window. status flips to matched in s14 when a match row is created.

Schema

  • Migration 008: creates need (+ indexes) and backfills the stormwater_infiltration need for every project at or after the activating stage (permitting onward, in funnel order), re-homing design_window from the legacy column. Idempotent via INSERT OR IGNORE on the unique constraint; the stage set mirrors need_generator.derive_needs. projects.stormwater_design_window retained (dropped in s16).

0.13.0 — 2026-06-23 (solar_attributes Extension)

Second session of the M2 Sourcebed schema-alignment arc. Relocates solar-specific project fields off the spine into a 1:1 solar_attributes extension table, the correctness test for the Sourcebed spine/extension split (a future project type = a new extension table, zero change here). Additive and forward-write only — zero LLM passes over the stored corpus. Legacy projects.capacity_mw is kept and parallel-written until the contract session (s16).

Added

  • solar_attributes extension table: nameplate_capacity_mw, interconnection_queue_position, ppa_status, panel_tech, keyed 1:1 to projects(id). nameplate_capacity_mw is backfilled from the legacy column for every project; the three net-new fields are added nullable and filled forward only (new ingests) — historical rows stay null. New models.upsert_solar_attributes() helper.
  • Forward extraction: the Sonnet extractor now emits a per-project solar_attributes object (interconnection_queue_position, ppa_status, panel_tech) populated from article text, plus forward-null legal_entity / legal_status on the spine. The latter two are intentionally instructed to stay null — reliable population needs a dedicated entity-research agent that does not exist yet, so they are wired but not filled.

Changed

  • Capacity source of truth is now solar_attributes. The pipeline (entity resolver), manual project edits, and project merges all write capacity into the extension and parallel-write the legacy projects.capacity_mw for rollback safety. All dashboard capacity reads (projects list + sort, project detail, autocomplete search, review queue) join solar_attributes and prefer the extension value, falling back to the legacy column while both exist.

Fixed

  • Project delete / merge were broken since v0.12.0: with foreign keys enforced, deleting a project failed because s11's stage_history backfill left a referencing row that delete_project/merge_projects never cleared (FOREIGN KEY constraint failed). Both paths now clean up stage_history (re-parented on merge to preserve the moat) and the new solar_attributes row before removing the project.

Schema

  • Migration 007: creates solar_attributes and backfills nameplate_capacity_mw from projects.capacity_mw (pure SQL, idempotent via INSERT OR IGNORE on the PK). projects.capacity_mw retained (dropped in s16).

0.12.0 — 2026-06-22 (Type-Agnostic Spine + stage_history)

First session of the M2 Sourcebed schema-alignment arc. Adds the type-agnostic project spine above the existing solar columns, a generic lifecycle_stage, and timestamped stage-transition history. Additive only — current_phase is kept and parallel-written until a later contract session.

Added

  • Project spine: project_type (default utility_solar), owner_entity, legal_entity, legal_status, lifecycle_stage, and signals columns on projects. owner_entity is backfilled from the resolved developer link; the new project-creation path sets the spine fields directly.
  • Generic lifecycle stage: current_phase maps to a funnel-position lifecycle_stage (identified → planning → permitting → … → operational) through a single shared vocabulary (processors/stage_vocab.py). The pipeline and manual edits write both columns through it so they cannot diverge; pipeline advances stay monotonic, manual edits are authoritative.
  • stage_history: new table recording timestamped lifecycle_stage transitions. Live transitions (pipeline resolution + manual phase edits) are logged as source='forward' — the unbackfillable elapsed-time moat. Reconstructed rows are marked source='backfill_approximate' and must be excluded from any elapsed-time model.

Schema

  • Migration 006: adds the six spine columns to projects, creates stage_history (+ index), and backfills lifecycle_stage, owner_entity, and one approximate stage_history row per project. Deterministic, pure-SQL, idempotent. current_phase retained.

0.11.0 — 2026-06-22 (Bulk Entity Management & Lead Filtering)

Bulk cleanup of noisy entities on project pages, plus lead/role classification to filter non-decision-makers at extraction time.

Added

  • Bulk remove people & companies: Checkbox columns with "select all" on the Key People and Companies tables on the project detail page. A bulk action bar appears when rows are selected; "Remove selected" deletes in a single transaction behind a confirmation dialog and a summary flash. Bulk removals are recorded in activity_log as one counted entry (not one per row). New delete_people_batch() and unlink_project_companies_batch() model helpers; new remove-people-bulk / remove-companies-bulk routes (company rows encode the composite id:role key).
  • Lead/role classification: The Sonnet extractor now classifies each extracted person as lead, support, or noise. noise people (county/city officials, board members, residents, landowners, journalists, etc.) are dropped at storage time; the count is tracked per run in pipeline_runs.people_skipped_noise. The project detail page hides non-lead people by default behind a "Show all" toggle and badges support / noise / unknown rows.

Schema

  • Migration 005: adds role_classification TEXT DEFAULT 'unknown' to people and people_skipped_noise INTEGER DEFAULT 0 to pipeline_runs, plus an index on people(role_classification).

0.10.0 — 2026-05-12 (Project Management)

Project deletion, AI timeline regeneration, merge audit trail, and CRM tracking checkboxes.

Added

  • Delete projects: Single-project delete button on the project detail page (with confirmation dialog) and bulk delete from the project list. Cascades through project_articles, project_companies, project_timeline; people records are preserved with project_id nulled. Deletions are recorded in activity_log with project_id=NULL and the deleted project label preserved for audit.
  • Merge activity log: merge_projects() now writes a project_merged entry to the target project for each source merged in, surfaced in the existing project detail Activity Log table. Source project's edit history is re-parented onto the target so no history is lost.
  • AI timeline regeneration: New "Regenerate Timeline" button on the project detail page consolidates events across all linked articles via Sonnet. Choice of replace (default) or append modes. New processors/timeline_regenerator.py handles the API call; route shows a loading state while the 10–20s call runs.
  • CRM tracking checkboxes: in_network_engine and in_master_db boolean flags on both projects and people. Project checkboxes appear on the project detail page; people checkboxes appear on the people list page. AJAX toggles update immediately. Project list page has a new filter dropdown ("Not in NetworkEngine", "Not in Master DB") to surface unexported records.

Schema

  • Migration 004: adds in_network_engine and in_master_db columns to projects and people, plus indexes on the project columns.

0.9.0 — 2026-05-11 (Pipeline Fixes & Optimization)

Cheaper, fresher, broader-coverage daily pipeline runs.

Added

  • Configurable lookback window: COLLECTION_LOOKBACK_DAYS (default 7) drops feed entries older than the window before processing. Articles with no parseable publish date are kept.
  • Early URL deduplication: Pipeline now checks collected URLs against the DB before the full-text fetch step. Reposts and known articles no longer trigger HTTP fetches. New bulk models.get_existing_urls() helper.
  • Developer-targeted feeds: Two new Google News RSS queries covering top utility-scale solar developers (SOLV Energy, McCarthy Building, Quanta Services, Nextracker, NextEra, First Solar, Lightsource bp, EDF Renewables, Invenergy, Clearway, Origis). Catches project announcements before they hit the trade press.

Fixed

  • PR Newswire feed: The direct prnewswire.com/rss/... endpoint returned malformed XML (feedparser bozo error). Replaced with a Google News RSS query scoped to site:prnewswire.com plus solar keywords — same source coverage via a reliable feed.

Changed

  • RSSCollector.collect_with_full_text() is now a thin wrapper over collect() + enrich_with_full_text(). The pipeline runner uses the split form so it can dedup between the two steps.

0.8.0 — 2026-04-20 (Manual Company Entry)

Standalone company creation and safer manual entity linking.

Added

  • Add Company form: New card on the Companies list page lets users create a standalone company with an optional type. Duplicate names redirect to the existing record; new entries redirect to the freshly created company's detail page.
  • POST /companies/add endpoint backing the new form.

Fixed

  • Manual company/person linking on project pages: The "Add Company" and "Add Person" forms on project detail previously ran typed names through the fuzzy entity resolver (resolve_company), which could silently map a new name to an unrelated existing company via short substring matches (partial_ratio × 0.9 crossing the 90% threshold). Manual entry now uses exact-name lookup, falling back to insert-new — the resolver remains in place for LLM-extracted names where fuzzy matching is desired.

0.7.0 — 2026-04-16 (Article Management)

Manual article import, article-project linking, video content detection, and pagination.

Added

  • Manual URL import: Paste an article URL on the Articles page to fetch, extract, and analyze it with Sonnet — skips keyword filter and Haiku classification (user has already vetted relevance). Shows project creation/update summary on completion.
  • Article-project linking: Search and link articles to existing projects from the article detail page. Includes autocomplete project search and unlink controls.
  • Video content detection: Detects YouTube/Vimeo embeds and video tags during article processing. New content_type field on articles (article, video, mixed). Filterable on the articles page.
  • Articles pagination: Replaced hardcoded 100-article limit with paginated browsing (50 per page) with prev/next navigation.
  • Articles relevance filter: Toggle between "All Articles" and "Linked to Projects" on the articles page.
  • Project search API: GET /articles/api/search-projects?q=... endpoint for autocomplete lookups.

Changed

  • Extraction refactor: Shared _process_extraction() helper in articles routes — used by both manual import and re-analyze, eliminating code duplication.
  • Pipeline content_type: RSS collector and pipeline runner now detect and store content_type for all collected articles.

Schema

  • Migration 003: content_type TEXT DEFAULT 'article' column on articles table.

Housekeeping

  • Added .pytest_cache/ and data/backups/ to .gitignore
  • Removed stale pip install artifacts (=22.0, =3.10)
  • Archived completed Session 4 plan to sessions/archive/

0.6.0 — 2026-03-20 (Branding & Public Pages)

Parjana branding, GaiaOps attribution, public homepage, and changelog page.

Added

  • Parjana favicon: Downloaded from parjanaengineering.com, referenced in all pages via base.html
  • Parjana logo: Icon in dashboard nav bar (24px) and login page header (64px)
  • GaiaOps footer: "Created by GaiaOps — Multiply Your Environmental Impact" on all pages with link to gaiaops.io
  • Public homepage: Anonymous visitors to / see a landing page with SEO meta tags (title, description), Parjana logo, project description, and login link. Authenticated users still see the dashboard.
  • Changelog page: /changelog route (public, no auth) renders CHANGELOG.md to HTML via the markdown library. Linked from footer.
  • SEO controls: <meta name="robots" content="noindex"> on all dashboard pages. Public homepage is the only indexed page.

Changed

  • / route: No longer requires authentication — shows public homepage for anonymous visitors, dashboard for authenticated users
  • Context processor: Skips DB query for nav counts on anonymous requests

Dependencies

  • Added markdown>=3.3 for changelog rendering

Assets

  • dashboard/static/img/favicon.png — Parjana stacked layers icon
  • dashboard/static/img/parjana-logo-icon.jpg — Parjana icon (standalone)
  • dashboard/static/img/parjana-logo-full.webp — Parjana full logo with text

0.5.0 — 2026-03-20 (Production Hardening)

Production reliability improvements: gunicorn WSGI server, daily database backups.

Added

  • Gunicorn WSGI server: Replaced Flask dev server with gunicorn (--workers 1 --timeout 120). Eliminates dev server warning in production.
  • Daily database backups: APScheduler job at 04:00 UTC copies SQLite DB to /data/backups/solarscout-YYYY-MM-DD.db with 7-day retention and auto-pruning.
  • Pre-deploy testing checklist: Documented in session notes.

Changed

  • App initialization: run_dashboard.py refactored so DB init, app creation, and scheduler start happen at module load — works for both gunicorn import and direct execution.
  • Scheduler: Now runs two jobs — daily pipeline (12:00 UTC) and daily backup (04:00 UTC).

0.4.0 — 2026-03-20 (Production Deployment)

First production deployment to Railway with custom domain.

Added

  • Production hosting: Deployed to Railway Hobby Plan with SQLite on persistent volume
  • Custom domain: Live at parjanasolarscout.app with auto-provisioned SSL via Railway/Let's Encrypt
  • Run Pipeline button: Dashboard home page has a "Run Pipeline Now" button with status polling, concurrent-run prevention, and auto-refresh on completion
  • APScheduler integration: Daily pipeline runs at 12:00 UTC via in-process background scheduler — no separate cron service needed
  • Pipeline HTTP endpoint: /run-pipeline endpoint (POST, auth required) for triggering pipeline runs programmatically
  • Pipeline status endpoint: /pipeline-status returns whether a pipeline run is currently in progress
  • BACKLOG.md: Backlog file for tracking planned features and known issues across sessions

Changed

  • Default Sonnet model: Updated from claude-sonnet-4-5-20250514 to claude-sonnet-4-6

Infrastructure

  • GitHub repo: ross-gaiaops852/solar-scout (private)
  • Railway web service auto-deploys from main branch
  • Volume mounted at /data for SQLite persistence across deploys
  • Environment variables managed via Railway dashboard (not .env in production)

First Production Pipeline Run

  • 305 articles collected from 7 feeds
  • 142 passed keyword filter, 54 passed Haiku classification
  • 48 new projects created, 51 companies identified, 11 people tracked
  • Estimated API cost: $1.41
  • Duration: ~31 minutes

0.3.0 — 2026-03-19 (Dashboard v2)

Dashboard usability improvements and activity tracking.

Added

  • Activity log: All manual edits to project fields and outreach status changes are logged chronologically. Displayed at the bottom of each project detail page.
  • People page: New top-level "People" tab — browse all tracked people with their company and project associations, with search/filter.
  • Review queue redesign: Side-by-side comparison of existing record vs incoming data. Differing fields highlighted. Links to view full existing record before deciding.

Improved

  • Key People company links: Company names in project Key People section now hyperlink to the company record.
  • Company People → Project links: Key People on company detail pages now show which project they're associated with, hyperlinked.
  • Review queue context: Shows structured field comparison instead of raw JSON blob.

Fixed

  • Intelligence extractor crash: .format() on extraction prompt was interpreting JSON schema braces as template variables (KeyError). Switched to .replace().
  • Article text lost on re-analysis failure: Text updates and API calls shared a transaction — if re-analysis failed, pasted text was rolled back. Text is now committed before attempting extraction.

Schema

  • Migration 002: activity_log table with project_id, field_name, old/new values, timestamp.

0.1.0 — 2026-03-03 (MVP Alpha)

First working end-to-end pipeline run.

Added

  • Data collection: RSS collector pulling from 7 feeds (Solar Power World, PV Magazine, Utility Dive, PR Newswire, GlobeNewsWire, 2 Google News queries)
  • Keyword filter: Two-tier regex matching (solar terms + development signals) eliminates irrelevant articles before any API calls
  • Haiku relevance classifier: Sends filtered articles to Claude Haiku for relevance scoring (50MW+ US solar projects)
  • Sonnet intelligence extractor: Full structured extraction — project details, companies, people, timelines, stormwater relevance, outreach urgency
  • Entity resolution: Fuzzy matching for projects (85% threshold) and companies (90% threshold) with ambiguous matches (70-85%) routed to review queue
  • Pipeline runner: Orchestrates collect → filter → classify → extract → resolve → store with per-run stats logging
  • Dashboard: Flask app with login, pipeline status, projects list (filterable/sortable), project detail with stormwater design window, companies, articles, review queue
  • Health endpoint: /health returns JSON stats without auth (Railway health checks)
  • Database: SQLite schema with 9 tables, indexes on common query columns
  • Tests: Keyword filter and entity resolver test suites

Fixed

  • Relevance classifier prompt crash — .format() was interpreting JSON braces as template variables
  • Entity resolver crash — sqlite3.Row objects don't support .get(); converted to dicts at resolver boundary
  • Entity resolver developer matching — _developer field was referenced but never populated; now queries project_companies table

Improved

  • Full-text fetching parallelized across domains (ThreadPoolExecutor) — previously sequential with 2s delay per article