Third session of the M2 Sourcebed schema-alignment arc. Adds the derived need layer: a project at a given lifecycle stage has needs derived from the project alone, before any supplier is attached — the engine of the future proactive-sales motion and the layer that makes "stormwater" a first-class derived opportunity rather than a column on the project. Today there is exactly one need type (stormwater_infiltration, Parjana's), which is why standing it up now — nothing to migrate — is cheap. Additive and zero-LLM. projects.stormwater_design_window is kept and parallel-read until the contract session (s16).
need table: need_id, project_id, need_type, relevant_at_stage, design_window, status (default unmatched), keyed by a unique (project_id, need_type) constraint, plus indexes on project_id and status. design_window is a supplier-neutral need attribute, so it re-homes here from projects.stormwater_design_window (not onto the Parjana match in s14).processors/need_generator.py — the canonical zero-LLM lookup project_type × lifecycle_stage → [need_type]. Today: stormwater_infiltration, hot across permitting / verification / construction; generated once a project reaches its earliest hot (activating) stage and persisting thereafter. The vocabulary grows from real match outcomes — no speculative need types are pre-populated. generate_for_project() is idempotent and never calls resolve_project; it preserves status on re-runs (so a match set in s14 is never reset) while refreshing the re-homed design_window.models.get_project_needs() and models.upsert_need() helpers.status flips to matched in s14 when a match row is created.need (+ indexes) and backfills the stormwater_infiltration need for every project at or after the activating stage (permitting onward, in funnel order), re-homing design_window from the legacy column. Idempotent via INSERT OR IGNORE on the unique constraint; the stage set mirrors need_generator.derive_needs. projects.stormwater_design_window retained (dropped in s16).Second session of the M2 Sourcebed schema-alignment arc. Relocates solar-specific project fields off the spine into a 1:1 solar_attributes extension table, the correctness test for the Sourcebed spine/extension split (a future project type = a new extension table, zero change here). Additive and forward-write only — zero LLM passes over the stored corpus. Legacy projects.capacity_mw is kept and parallel-written until the contract session (s16).
solar_attributes extension table: nameplate_capacity_mw, interconnection_queue_position, ppa_status, panel_tech, keyed 1:1 to projects(id). nameplate_capacity_mw is backfilled from the legacy column for every project; the three net-new fields are added nullable and filled forward only (new ingests) — historical rows stay null. New models.upsert_solar_attributes() helper.solar_attributes object (interconnection_queue_position, ppa_status, panel_tech) populated from article text, plus forward-null legal_entity / legal_status on the spine. The latter two are intentionally instructed to stay null — reliable population needs a dedicated entity-research agent that does not exist yet, so they are wired but not filled.solar_attributes. The pipeline (entity resolver), manual project edits, and project merges all write capacity into the extension and parallel-write the legacy projects.capacity_mw for rollback safety. All dashboard capacity reads (projects list + sort, project detail, autocomplete search, review queue) join solar_attributes and prefer the extension value, falling back to the legacy column while both exist.stage_history backfill left a referencing row that delete_project/merge_projects never cleared (FOREIGN KEY constraint failed). Both paths now clean up stage_history (re-parented on merge to preserve the moat) and the new solar_attributes row before removing the project.solar_attributes and backfills nameplate_capacity_mw from projects.capacity_mw (pure SQL, idempotent via INSERT OR IGNORE on the PK). projects.capacity_mw retained (dropped in s16).First session of the M2 Sourcebed schema-alignment arc. Adds the type-agnostic project spine above the existing solar columns, a generic lifecycle_stage, and timestamped stage-transition history. Additive only — current_phase is kept and parallel-written until a later contract session.
project_type (default utility_solar), owner_entity, legal_entity, legal_status, lifecycle_stage, and signals columns on projects. owner_entity is backfilled from the resolved developer link; the new project-creation path sets the spine fields directly.current_phase maps to a funnel-position lifecycle_stage (identified → planning → permitting → … → operational) through a single shared vocabulary (processors/stage_vocab.py). The pipeline and manual edits write both columns through it so they cannot diverge; pipeline advances stay monotonic, manual edits are authoritative.lifecycle_stage transitions. Live transitions (pipeline resolution + manual phase edits) are logged as source='forward' — the unbackfillable elapsed-time moat. Reconstructed rows are marked source='backfill_approximate' and must be excluded from any elapsed-time model.projects, creates stage_history (+ index), and backfills lifecycle_stage, owner_entity, and one approximate stage_history row per project. Deterministic, pure-SQL, idempotent. current_phase retained.Bulk cleanup of noisy entities on project pages, plus lead/role classification to filter non-decision-makers at extraction time.
activity_log as one counted entry (not one per row). New delete_people_batch() and unlink_project_companies_batch() model helpers; new remove-people-bulk / remove-companies-bulk routes (company rows encode the composite id:role key).lead, support, or noise. noise people (county/city officials, board members, residents, landowners, journalists, etc.) are dropped at storage time; the count is tracked per run in pipeline_runs.people_skipped_noise. The project detail page hides non-lead people by default behind a "Show all" toggle and badges support / noise / unknown rows.role_classification TEXT DEFAULT 'unknown' to people and people_skipped_noise INTEGER DEFAULT 0 to pipeline_runs, plus an index on people(role_classification).Project deletion, AI timeline regeneration, merge audit trail, and CRM tracking checkboxes.
project_articles, project_companies, project_timeline; people records are preserved with project_id nulled. Deletions are recorded in activity_log with project_id=NULL and the deleted project label preserved for audit.merge_projects() now writes a project_merged entry to the target project for each source merged in, surfaced in the existing project detail Activity Log table. Source project's edit history is re-parented onto the target so no history is lost.processors/timeline_regenerator.py handles the API call; route shows a loading state while the 10–20s call runs.in_network_engine and in_master_db boolean flags on both projects and people. Project checkboxes appear on the project detail page; people checkboxes appear on the people list page. AJAX toggles update immediately. Project list page has a new filter dropdown ("Not in NetworkEngine", "Not in Master DB") to surface unexported records.in_network_engine and in_master_db columns to projects and people, plus indexes on the project columns.Cheaper, fresher, broader-coverage daily pipeline runs.
COLLECTION_LOOKBACK_DAYS (default 7) drops feed entries older than the window before processing. Articles with no parseable publish date are kept.models.get_existing_urls() helper.prnewswire.com/rss/... endpoint returned malformed XML (feedparser bozo error). Replaced with a Google News RSS query scoped to site:prnewswire.com plus solar keywords — same source coverage via a reliable feed.RSSCollector.collect_with_full_text() is now a thin wrapper over collect() + enrich_with_full_text(). The pipeline runner uses the split form so it can dedup between the two steps.Standalone company creation and safer manual entity linking.
/companies/add endpoint backing the new form.resolve_company), which could silently map a new name to an unrelated existing company via short substring matches (partial_ratio × 0.9 crossing the 90% threshold). Manual entry now uses exact-name lookup, falling back to insert-new — the resolver remains in place for LLM-extracted names where fuzzy matching is desired.Manual article import, article-project linking, video content detection, and pagination.
content_type field on articles (article, video, mixed). Filterable on the articles page.GET /articles/api/search-projects?q=... endpoint for autocomplete lookups._process_extraction() helper in articles routes — used by both manual import and re-analyze, eliminating code duplication.content_type for all collected articles.content_type TEXT DEFAULT 'article' column on articles table..pytest_cache/ and data/backups/ to .gitignore=22.0, =3.10)sessions/archive/Parjana branding, GaiaOps attribution, public homepage, and changelog page.
base.html/ see a landing page with SEO meta tags (title, description), Parjana logo, project description, and login link. Authenticated users still see the dashboard./changelog route (public, no auth) renders CHANGELOG.md to HTML via the markdown library. Linked from footer.<meta name="robots" content="noindex"> on all dashboard pages. Public homepage is the only indexed page./ route: No longer requires authentication — shows public homepage for anonymous visitors, dashboard for authenticated usersmarkdown>=3.3 for changelog renderingdashboard/static/img/favicon.png — Parjana stacked layers icondashboard/static/img/parjana-logo-icon.jpg — Parjana icon (standalone)dashboard/static/img/parjana-logo-full.webp — Parjana full logo with textProduction reliability improvements: gunicorn WSGI server, daily database backups.
--workers 1 --timeout 120). Eliminates dev server warning in production./data/backups/solarscout-YYYY-MM-DD.db with 7-day retention and auto-pruning.run_dashboard.py refactored so DB init, app creation, and scheduler start happen at module load — works for both gunicorn import and direct execution.First production deployment to Railway with custom domain.
/run-pipeline endpoint (POST, auth required) for triggering pipeline runs programmatically/pipeline-status returns whether a pipeline run is currently in progressclaude-sonnet-4-5-20250514 to claude-sonnet-4-6ross-gaiaops852/solar-scout (private)main branch/data for SQLite persistence across deploys.env in production)Dashboard usability improvements and activity tracking.
.format() on extraction prompt was interpreting JSON schema braces as template variables (KeyError). Switched to .replace().activity_log table with project_id, field_name, old/new values, timestamp.First working end-to-end pipeline run.
/health returns JSON stats without auth (Railway health checks).format() was interpreting JSON braces as template variablessqlite3.Row objects don't support .get(); converted to dicts at resolver boundary_developer field was referenced but never populated; now queries project_companies table