Forest Products Trade Flow database
The objective of this database system is to provide an up-to-date, neutral, comprehensive source of forest products bilateral trade statistics to the public through an online interface. The system modernises the way trade flow data are downloaded from UN COMTRADE, stored, validated and presented in a way that has more value added for the end-user. The system allows for semi-automatic updating of the data.
Trade flow data discrepancies and the need for data validation
Import and export data which specifies country of origin and country of destination are called dyadic data or bilateral tradeflow data. Such data provide useful insights into the state and development of globalised forest products and related markets and they are an input into macro-econometric models.
However, data quality varies greatly throughout the product range and across countries: tradeflow figures might be erroneous or missing. When looking at annual trade of a commodity between countries, large discrepancies can appear, for various reasons.
Each time when a country A exports a certain commodity to country B, a number of data variables are recorded, typically by each of the respective national customs organisations as export and import. The recorded data include value of the shipment and quantity in one or more unit types, typically in metric tonnes and cubic meters but depending on the commodity, a wealth of different quantity metrics can be applied in different countries.
Normally the quantity of a shipment should remain the same between origin and destination country and the value should also remain more or less the same. However, as in many cases these data are not comparable, it is difficult to judge which of the data would be correct.
Some of the reasons for trade flow data discrepancies are reported here below (not in order of importance):
- Triangular trade
- Product conversion in customs zones or free-trade areas
- Mis-reportings by one of the partner countries
- Non-reportings by one of the partner countries (the ‘mirror issue’)
- Differences between countries in methods of assessing trade value and quantity (particularly an issue with intra-EU trade)
- Partners report in different classification systems
- Time-shift effects due to trade leaving a country in one period and arriving in partner country in the next period
- Erroneous indication or misinterpretation of units or currencies
- Erroneous conversion between quantity units or between currencies
- Confidentiality e.g. when there are very few economic operators related to a particular commodity in a country
Building further on the method developed by Michie and Wardle (2002), a data quality assessment and correction procedure was developed to handle outliers (i.e. data that are out of bounds and surely erroneous), mirroring missing data, estimating missing quantity data (cubic meters or metric tonnes, depending on the commodity international standard quantity unit).
Data source and classification
Historically the organisation collecting data on the international trade in goods is the Customs Cooperation Council (CCC), established in 1952 and which was renamed in 1994 to World Customs Organisation (WCO). Traded products were classified according to the Standard International Trade Classification (SITC) until the CCC adopted a new trade classification called the Harmonized System (HS). The International Convention on the Harmonized Commodity Description and Coding System was adopted in 1983 and the first version of the Harmonized System entered into use as of 1988. The Harmonized System is a hierarchical classification in which commodities are encoded with unique 6-digit codes, explained with corresponding definitions. The classification system is issued with revisions about every 5 years. As such, consecutive revisions entered into force in 1992, 1996, 2002, 2007, 2012, and soon 2017. Each of these revisions included additions or deletions of codes, in response to changing significance either in terms of commodity values or quantities or in terms of policy or societal values or both.
Please note that data updating depends on data availability from the primary source, COMTRADE. Typically it takes countries one to two years to provide COMTRADE with complete data.
Currently in total 7 HS chapters relate to forest fibre commodities. Yearly COMTRADE data for the wood-based forest commodities were compiled from the following chapters:
- Chapter 44: Wood and articles of wood; wood charcoal
- Chapter 45: Cork and articles of cork
- Chapter 46: Manufactures of straw, of esparto or of other plaiting materials; basketware and wickerwork (including products made of bamboo and rattan)
- Chapter 47: Pulp of wood or of other fibrous cellulosic material; Recovered paper and paperboard
- Chapter 48: Paper and paperboard; Articles of paper pulp, of paper or of paperboard
- Chapter 49: Printed books, newspapers, pictures and other products of the printing industry; Manuscripts, typescripts and plans
- Chapter 94: Furniture; Bedding, mattresses, mattress supports, cushions and similar stuffed furnishings; Lamps and lighting fittings, not elsewhere specified or included; Illuminated signs, illuminated name-plates and the like; prefabricated buildings.
Initially, yearly data were loaded into the system starting from the year 2004. At the time of the launch of the database at the beginning of the year 2016, data were loaded until the year 2014. The database will be updated regularly.
The database system includes all HS codes and definitions that would in theory enable it to load data from 1992 until 2022. It is planned to gradually expand the dataset from the years that are initially covered (2004-2014) to cover the whole range.
Data handling and cleaning method
The trade flow database interacts with three modules illustrated the following figure: data extractor, data cleaning and query interface. The data extractor is capable of automatically querying products trade flows from the ComTrade data interface (ComTrade API), parsing them and inserting them into the database. The data cleaning module is implemented in the R statistical language. Tradeflows are cleaned by using an automated and reproducible procedure. The query interface - accessible through a web server - allows users to visualise tradeflows and generate PDF-reports.
The following figure illustrates how various data manipulation steps are connected to handle missing quantity data, out of bounds prices and mirror information from the trade partner. Each step in the workflow is explained in more detail in the project report as indicated in the section below.
Query for data and reports
The forest products trade flow database can be queried through a website interface, which has to sub-components. The first component allows browsing large amounts of data easily through following pre-formatted but customizable reports:
- Completeness report for one product and all countries; based on raw data.
- Discrepancy report for one reporter and product with all partners; based on raw data.
- 2 types of overview report; based on cleaned data.
- Value data for major flows for one reporter and all products
- Quantity data for major flows for one reporter and all products
The second component provides a more standard data interface, allowing querying trade flow data between a reporting country and a multitude of trade partner countries, in volume or quantity, for a selected period of time. Free and open source softwares were used during the development of this website.
The trade flow cleaning procedure was developed in R and the code was published as the first EFI open source tool. You are welcome to download and contribute to the development of the code via EFI’s Github site: https://github.com/EuropeanForestInstitute
It is intended to further expand the database with more detailed commodity data for global trade with EU countries, as recorded in the Eurostat COMEXT database, and to develop data validation and cleaning routines based on monthly data.
The current database relies fully on the Harmonized System commodity classification and discontinues the dataset of the EFI-WFSE database. Therefore it is also intended to compile cross-reference tables in order to produce trade flow estimates following the FAOSTAT commodity classification.
Download the full methodological report
The full methodological report has been published in the Technical Reports series: Rougieux et al. 2017. The Forest Products Trade Flow Database – A reproducible method and tool to support the analysis of international forest products trade. EFI Technical Report 100, 2017.
The project was implemented by an enthusiastic project team that handled this relatively small project to produce a great outcome: Paul Rougieux, Jo Van Brusselen, Simo Varis, Sergey Zudin, Janne Kiljunen, Marko Lovric.
The development of this database was possible thanks to financial support from the FLEGT Independent Market Monitoring – a multi-year project supervised by ITTO and financed by the European Union (EU) to monitor impacts on trade from the EU FLEGT Action plan, including the EU Timber Regulation and bilateral Voluntary Partnership Agreements (VPA) between the EU and timber supplying countries.
We would like to thank Steve Johnson, Rupert Oliver and Jean-Christophe Claudon at ITTO, as they provided us with very constructive support and advice based on their longstanding ex-pertise with the analysis of trade flow data. We would also like to thank Ronald Jansen, Markie Muryawan, Daniel Eshetie and Nancy Snyder at UN COMTRADE for granting us unrestricted access and for help with setting up the connection to the database. We wish to congratulate UN COMTRADE with the important step to increasing timely, speedily and free access of their data through the COMTRADE API.
Special thanks go also to Ed Pepke and Tomi Tuomasjukka who with their colleagues at the EU FLEGT Facility Analysis Team were also at the crib of the concept to redevelop towards a more automated and open system for updating and validating the EFI-WFSE forest products trade flow database. Latter database was originally developed at EFI by Bruce Michie and Philip Wardle, who we warmly want to acknowledge for their ground-breaking work. This database can be found here.
All mistakes remain ours.
Finally, we wish to thank those who will still contribute to this living project. In connection to this report we publish open-source code and templates in the R programming language that we invite the community of (forest products) trade flow researchers and analysts to help further develop and customise. Welcome to the team!