MOJAVE Scraper
scripts/Mojave_html_to_csv.py
Fetches the MOJAVE VLBA velocity table from NRAO, parses the HTML, splits ± columns into separate value/error fields, and saves a cleaned CSV.
Usage
python scripts/Mojave_html_to_csv.py
Requires network access. The output file mojave_cleaned.csv is written to the project root.
Data Source
The script fetches the HTML table from:
https://www.cv.nrao.edu/MOJAVE/velocitytableXVIII.html
This is the MOJAVE XVIII velocity table containing proper motion measurements for AGN jet components observed with VLBA.
Processing Steps
- Fetch — Downloads the HTML page and unescapes HTML entities (e.g.,
±to±). - Parse — Uses
pandas.read_html()to extract the first table. - Clean headers — Drops the first 3 rows (multi-level headers) and the last row (footer).
- Split ± — For every column containing
±, creates two new columns:<col>_valand<col>_err. - Convert — Applies
pd.to_numericwhere possible. - Save — Writes to
mojave_cleaned.csvat the project root.
Output CSV
The cleaned CSV preserves all original columns from the MOJAVE table. Columns that originally contained ± notation are split into separate value and error columns.
Key columns used by the MCMC models (via simulationImport):
| Column index | Content |
|---|---|
| 13 | Transverse velocity (value) |
| 14 | Transverse velocity (uncertainty) |
The full CSV contains source names, component IDs, proper motions, position angles, and other VLBA measurements.