Automatic analysis and scoring of corporate sustainability reports - a machine learning approach

by Amir Mohammad Shahi

Institution: Swinburne University of Technology
Department: Sarawak Campus. Faculty of Engineering, Computing and Science
Degree: Masters
Year: 2014
Keywords: Machine learning; Text mining; Supervised learning; Corporate sustainability report; Global reporting initiative; CSR; GRI
Record ID: 1052032
Full text PDF: http://hdl.handle.net/1959.3/380961


Sustainable development has become an important factor for determining a corporation's business success. This means the success of a business is not solely based on its financial performance, but also on its responsibility towards the environment and the community. Publishing periodic Corporate Sustainability Reports (CSR) has been gaining popularity among business entities as an acceptable method of conveying sustainable development messages to current and prospective stakeholders. A CSR report usually contains various chapters containing information disclosures regarding sustainability and responsibility factors towards economy, environment and society. Many organizations such as the International Standardization Organization (ISO) and the Global Reporting Initiative (GRI) have published their respective CSR development frameworks to be applied to CSR reports. ISO 14000 and GRI 3.0 (also referred to as G3) are, by far, the most popular frameworks published by these organizations respectively. Above frameworks also include guidelines for measuring the extent to which the frameworks have been applied to the CSR reports and provide them with an application level (or score). Currently, determining the application level of CSR reports is done manually by framework experts through an exhaustive document-framework crosschecking process. With the exponential growth of published CSR reports at the same time with that of computing power, it is a brilliant idea to develop an intelligent document analysis software system to which such task could be allocated. This research was conducted aiming at proving the above hypothesis by developing an intelligent CSR scoring software system, which would determine the application level of GRI G3 framework to any given CSR report. This dissertation describes the features of the mentioned software and its importance in the industry. It also includes the design considerations and challenges of its development followed by the proposed architectural decisions and solutions. Additionally, this research studied information disclosure likelihoods of business entities of various industrial sectors. This was done for two purposes: testing the effectiveness of the developed software solution in terms of disclosure discovery as well as classifying various reporting industries based on their level and amount of corporate sustainability performance disclosures. The latter was done in order to discover the most and least disclosed disclosure chapters and indicators among the studied corporations and industries. This was done by performing thorough statistical studies and analysis. These findings are believed to be of considerable value to future researchers who magnify their research focus on more or less disclosing industries.