Skip to content

harvard-growth-lab/comtrade-mirroring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

225 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comtrade Mirroring

Produces harmonized bilateral trade estimates by reconciling exporter- and importer-reported UN Comtrade data using a reliability-weighted method.

What This Does

Transforms UN Comtrade data into clean bilateral trade data through a mirroring process that reconciles discrepancies between exporter and importer reported values. This methodology underpins the bilateral trade data published in the Atlas of Economic Complexity.

Prerequisites

Installation

git clone https://github.com/your-org/comtrade-mirroring.git
cd comtrade-mirroring
poetry install && poetry shell

# Set up environment variables
export FRED_API_KEY="your_fred_key_here"

Quick Start

  1. Configure processing settings in user_config.py
  2. Run the pipeline: python main.py
  3. Find results in your configured output directory

Configuration

Edit user_config.py:

Select Classifications to Process

# Choose which trade classifications to process
PROCESS_SITC = False   # SITC data from 1962-END_YEAR
PROCESS_HS92 = True    # HS92 data from 1992-END_YEAR
PROCESS_HS12 = True    # HS12 data from 2012-END_YEAR

# Test mode - only process recent years
TEST_MODE = True
TEST_START_YEAR = 2020
END_YEAR = 2023

Set Data Paths

# Path to downloaded Comtrade files
DOWNLOADED_FILES_PATH = "/path/to/downloaded/comtrade/data"

# Results output directory
FINAL_OUTPUT_PATH = "/path/to/output/directory"

Processing Steps

PROCESSING_STEPS = {
    "run_cleaning": True,              # Main bilateral trade cleaning pipeline
    "delete_intermediate_files": True, # Clean up intermediate files
}

Supported Classifications

SITC (Standard International Trade Classification):

  • SITC Revision 1 (1962-current)
  • SITC Revision 2 (1976-current)
  • SITC Revision 3 (1988-current)

HS (Harmonized System):

  • HS1992 (1992-current)
  • HS1996 (1996-current)
  • HS2002 (2002-current)
  • HS2007 (2007-current)
  • HS2012 (2012-current)
  • HS2017 (2017-current)
  • HS2022 (2022-current)

Output

Mirrored trade data saved as:

{FINAL_OUTPUT_PATH}/{DATA_VERSION}/mirrored_output/
├── H0/                    # HS92 bilateral trade data
│   ├── H0_2020.parquet
│   ├── H0_2021.parquet
│   └── ...

Each trade file contains: year, exporter, importer, commoditycode, value_final, value_exporter, value_importer

How It Works

The mirroring pipeline consists of five processing steps:

1. Preprocess and aggregate trade data

2. Adjust values from CIF to FOB

3. Compute country reporting reliability scores

4. Reconcile country-pair trade totals

5. Reconcile product-level trade values

The final output provides reconciled trade values that combine exporter and importer reports based a country reporting reliability network.

Repository Structure

comtrade-mirroring/
├── mirror/
│   ├── main.py                    # Main entry point
│   ├── user_config.py             # Configuration
│   ├── src/
│   │   ├── objects/
│   │   ├── table_objects/
│   │   └── utils/
    ├── logs/ 
    ├── images/     
│   └── data/
│       ├── static/
├── pyproject.toml                # Python dependencies
└── README.md                     # This file

Data Requirements

Input Data Structure

The pipeline expects downloaded Comtrade data in this structure:

{DOWNLOADED_FILES_PATH}/
├── H0/                    # HS92 classification
│   ├── H0_2020.parquet
│   ├── H0_2021.parquet
│   └── ...
├── H4/                    # HS12 classification  
│   ├── H4_2020.parquet
│   └── ...
└── SITC/                  # SITC classification
    ├── SITC_2020.parquet
    └── ...

Technical Details

System Requirements

  • Memory: ~16GB+ RAM recommended for full processing
  • Storage: 30GB+ available space for files

License

Apache License, Version 2.0 - see LICENSE file.

Citation

DOI

@Misc{comtrade_mirroring,
  author={Harvard Growth Lab},
  title={Comtrade Mirroring Pipeline},
  year={2025},
  howpublished = {\url{https://github.com/harvard-growth-lab/comtrade-mirroring}},
}

About

Produces harmonized bilateral trade estimates by reconciling exporter- and importer-reported UN Comtrade data using a reliability-weighted method

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages