Produces harmonized bilateral trade estimates by reconciling exporter- and importer-reported UN Comtrade data using a reliability-weighted method.
Transforms UN Comtrade data into clean bilateral trade data through a mirroring process that reconciles discrepancies between exporter and importer reported values. This methodology underpins the bilateral trade data published in the Atlas of Economic Complexity.
- Python 3.10+
- Poetry for managing dependencies
- FRED API key (get one here)
- Comtrade data files (download from comtrade-downloader)
git clone https://github.com/your-org/comtrade-mirroring.git
cd comtrade-mirroring
poetry install && poetry shell
# Set up environment variables
export FRED_API_KEY="your_fred_key_here"- Configure processing settings in
user_config.py - Run the pipeline:
python main.py - Find results in your configured output directory
Edit user_config.py:
# Choose which trade classifications to process
PROCESS_SITC = False # SITC data from 1962-END_YEAR
PROCESS_HS92 = True # HS92 data from 1992-END_YEAR
PROCESS_HS12 = True # HS12 data from 2012-END_YEAR
# Test mode - only process recent years
TEST_MODE = True
TEST_START_YEAR = 2020
END_YEAR = 2023# Path to downloaded Comtrade files
DOWNLOADED_FILES_PATH = "/path/to/downloaded/comtrade/data"
# Results output directory
FINAL_OUTPUT_PATH = "/path/to/output/directory"PROCESSING_STEPS = {
"run_cleaning": True, # Main bilateral trade cleaning pipeline
"delete_intermediate_files": True, # Clean up intermediate files
}SITC (Standard International Trade Classification):
- SITC Revision 1 (1962-current)
- SITC Revision 2 (1976-current)
- SITC Revision 3 (1988-current)
HS (Harmonized System):
- HS1992 (1992-current)
- HS1996 (1996-current)
- HS2002 (2002-current)
- HS2007 (2007-current)
- HS2012 (2012-current)
- HS2017 (2017-current)
- HS2022 (2022-current)
Mirrored trade data saved as:
{FINAL_OUTPUT_PATH}/{DATA_VERSION}/mirrored_output/
├── H0/ # HS92 bilateral trade data
│ ├── H0_2020.parquet
│ ├── H0_2021.parquet
│ └── ...
Each trade file contains: year, exporter, importer, commoditycode, value_final, value_exporter, value_importer
The mirroring pipeline consists of five processing steps:
The final output provides reconciled trade values that combine exporter and importer reports based a country reporting reliability network.
comtrade-mirroring/
├── mirror/
│ ├── main.py # Main entry point
│ ├── user_config.py # Configuration
│ ├── src/
│ │ ├── objects/
│ │ ├── table_objects/
│ │ └── utils/
├── logs/
├── images/
│ └── data/
│ ├── static/
├── pyproject.toml # Python dependencies
└── README.md # This file
The pipeline expects downloaded Comtrade data in this structure:
{DOWNLOADED_FILES_PATH}/
├── H0/ # HS92 classification
│ ├── H0_2020.parquet
│ ├── H0_2021.parquet
│ └── ...
├── H4/ # HS12 classification
│ ├── H4_2020.parquet
│ └── ...
└── SITC/ # SITC classification
├── SITC_2020.parquet
└── ...
- Memory: ~16GB+ RAM recommended for full processing
- Storage: 30GB+ available space for files
Apache License, Version 2.0 - see LICENSE file.
@Misc{comtrade_mirroring,
author={Harvard Growth Lab},
title={Comtrade Mirroring Pipeline},
year={2025},
howpublished = {\url{https://github.com/harvard-growth-lab/comtrade-mirroring}},
}