Minimize the carbon footprint of data analytics, maximize data center sustainability

Faster total time to insights is kinder to the environment.

More than ever, executives are under pressure to reduce their environmental impact. This is especially true for data centers because of their contribution to global warming. If all the data centers in the world were a country, they would be ranked as the fifth largest energy consumer in the world. In 2020, data centers consumed about 1% of global electricity demand and contributed to 0.3% of all CO2 emissions.

Today, companies are required to provide transparency about their carbon footprint, and the race for data centers is on to improve their efficiency rankings. There is a list of data centers around the world raked in by PUE (effectiveness of price use) and Greenpeace has prepared a cleantech industry ranking of centers based on their carbon footprint.

The need for greener code

Many of data center sustainability initiatives are based on using renewable energy for cooling or optimizing cooling systems to reduce power consumption. However, in addition to the energy required to maintain environmental controls for data analysis, the software itself also has a significant effect on the amount of electricity consumed. How much? Pretty.

Based on current research, one large machine learning model (ML), such as Meena, consumes the same amount of energy as a passenger car that has driven 242,231 miles. Researchers from the University of Massachusetts at Amherst estimate that training a large deep-learning model produces 626,000 pounds of CO2, equivalent to the lifetime emissions of five cars.

As a result, there is an increased interest and commitment to creating more efficient code. The Green Software Foundation (GSF), with members such as VMware, Microsoft, Accenture and GitHub, is on a mission to design, design and code software that uses less energy.

Tips for sustainable machine learning

There are several academic articles on writing greener algorithms for AI/ML models, but here are some basic tips.

One way to reduce computing resources is to minimize the number of training experiments. There are hundreds of ML models or blueprints that are pre-trained, where developers only need to bring their own data to bring AI capabilities into applications, significantly reducing the time it takes to develop and train models.

It is also important to understand the environmental footprint of the algorithm in order to make decisions about the best way to optimize its performance. Researchers from various universities have developed tools for this. For example, Green Algorithms calculates the carbon footprint of your cloud computing. Another example is CodeCarbon, a software package that integrates into the Python codebase and estimates the amount of CO2 produced by the computing resources used to run the code.

Automation can also be used to shorten the duration of the training. It is possible to minimize the number of experiments and/or the amount of data analyzed while maintaining accuracy. More efficient data sampling alone can speed up the model run time by a factor of 5.8.

The software used to actually perform the calculations can also help reduce the amount of computer resources required. There are databases specially designed for processing massive amounts of data that can optimize memory and storage usage to reduce power consumption. These databases also have the advantage of not needing to limit the amount of data being analyzed, which reduces the risk of compromising the accuracy of the model by trying to speed up the runtime.

Shortening the run time of the model, in addition to increasing energy efficiency, reduces the overall time to insights for mission-critical applications such as fraud detection, cybersecurity solutions, quality control, etc. More efficient code is not only better for the environment, but it is also good for business.

More potential customers want transparency in a company’s commitment to its green strategies and having a code “green” standard could be an important first step. Employees want to work for an environmentally sensitive company that makes responsible decisions regarding the environment. In the future, cloud vendors may need insight into a workload’s carbon footprint, with fines for processing deemed excessive or unnecessary.

With the sheer volume of calculations needed to derive meaning to make better business decisions, corporate social responsibility isn’t just a nice thing to have, it’s become a necessity.

Ohad Shalev is a strategic analyst at SQream

DataDecision makers

Welcome to the VentureBeat Community!

DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.

If you want to read about the very latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contributing an article yourself!

Read more from DataDecisionMakers

Leave a Comment