Connect with us

Science & Technology

Building explainability into the components of machine-learning models

Researchers develop tools to help data scientists make the features used in machine-learning models more understandable for end users

EP Staff

Published

on

Written by Adam Zewe, MIT News Office

Explanation methods that help users understand and trust machine-learning models often describe how much certain features used in the model contribute to its prediction. For example, if a model predicts a patient’s risk of developing cardiac disease, a physician might want to know how strongly the patient’s heart rate data influences that prediction.

But if those features are so complex or convoluted that the user can’t understand them, does the explanation method do any good?

MIT researchers are striving to improve the interpretability of features so decision makers will be more comfortable using the outputs of machine-learning models. Drawing on years of field work, they developed a taxonomy to help developers craft features that will be easier for their target audience to understand.

“We found that out in the real world, even though we were using state-of-the-art ways of explaining machine-learning models, there is still a lot of confusion stemming from the features, not from the model itself,” says Alexandra Zytek, an electrical engineering and computer science PhD student and lead author of a paper introducing the taxonomy.

To build the taxonomy, the researchers defined properties that make features interpretable for five types of users, from artificial intelligence experts to the people affected by a machine-learning model’s prediction. They also offer instructions for how model creators can transform features into formats that will be easier for a layperson to comprehend.

They hope their work will inspire model builders to consider using interpretable features from the beginning of the development process, rather than trying to work backward and focus on explainability after the fact.

MIT co-authors include Dongyu Liu, a postdoc; visiting professor Laure Berti-Équille, research director at IRD; and senior author Kalyan Veeramachaneni, principal research scientist in the Laboratory for Information and Decision Systems (LIDS) and leader of the Data to AI group. They are joined by Ignacio Arnaldo, a principal data scientist at Corelight. The research is published in the June edition of the Association for Computing Machinery Special Interest Group on Knowledge Discovery and Data Mining’s peer-reviewed Explorations Newsletter.

Real-world lessons

Features are input variables that are fed to machine-learning models; they are usually drawn from the columns in a dataset. Data scientists typically select and handcraft features for the model, and they mainly focus on ensuring features are developed to improve model accuracy, not on whether a decision-maker can understand them, Veeramachaneni explains.

For several years, he and his team have worked with decision makers to identify machine-learning usability challenges. These domain experts, most of whom lack machine-learning knowledge, often don’t trust models because they don’t understand the features that influence predictions.

For one project, they partnered with clinicians in a hospital ICU who used machine learning to predict the risk a patient will face complications after cardiac surgery. Some features were presented as aggregated values, like the trend of a patient’s heart rate over time. While features coded this way were “model ready” (the model could process the data), clinicians didn’t understand how they were computed. They would rather see how these aggregated features relate to original values, so they could identify anomalies in a patient’s heart rate, Liu says.

By contrast, a group of learning scientists preferred features that were aggregated. Instead of having a feature like “number of posts a student made on discussion forums” they would rather have related features grouped together and labeled with terms they understood, like “participation.”

“With interpretability, one size doesn’t fit all. When you go from area to area, there are different needs. And interpretability itself has many levels,” Veeramachaneni says.

The idea that one size doesn’t fit all is key to the researchers’ taxonomy. They define properties that can make features more or less interpretable for different decision makers and outline which properties are likely most important to specific users.

For instance, machine-learning developers might focus on having features that are compatible with the model and predictive, meaning they are expected to improve the model’s performance.

On the other hand, decision makers with no machine-learning experience might be better served by features that are human-worded, meaning they are described in a way that is natural for users, and understandable, meaning they refer to real-world metrics users can reason about.

“The taxonomy says, if you are making interpretable features, to what level are they interpretable? You may not need all levels, depending on the type of domain experts you are working with,” Zytek says.

Putting interpretability first

The researchers also outline feature engineering techniques a developer can employ to make features more interpretable for a specific audience.

Feature engineering is a process in which data scientists transform data into a format machine-learning models can process, using techniques like aggregating data or normalizing values. Most models also can’t process categorical data unless they are converted to a numerical code. These transformations are often nearly impossible for laypeople to unpack.

Creating interpretable features might involve undoing some of that encoding, Zytek says. For instance, a common feature engineering technique organizes spans of data so they all contain the same number of years. To make these features more interpretable, one could group age ranges using human terms, like infant, toddler, child, and teen. Or rather than using a transformed feature like average pulse rate, an interpretable feature might simply be the actual pulse rate data, Liu adds.

“In a lot of domains, the tradeoff between interpretable features and model accuracy is actually very small. When we were working with child welfare screeners, for example, we retrained the model using only features that met our definitions for interpretability, and the performance decrease was almost negligible,” Zytek says.

Building off this work, the researchers are developing a system that enables a model developer to handle complicated feature transformations in a more efficient manner, to create human-centered explanations for machine-learning models. This new system will also convert algorithms designed to explain model-ready datasets into formats that can be understood by decision makers.

Science & Technology

Passive cooling system could benefit off-grid locations

Relying on evaporation and radiation — but not electricity — the system could keep food fresh longer or supplement air conditioning in buildings

EP Staff

Published

on

Written by David L. Chandler, MIT News Office

As the world gets warmer, the use of power-hungry air conditioning systems is projected to increase significantly, putting a strain on existing power grids and bypassing many locations with little or no reliable electric power. Now, an innovative system developed at MIT offers a way to use passive cooling to preserve food crops and supplement conventional air conditioners in buildings, with no need for power and only a small need for water.

The system, which combines radiative cooling, evaporative cooling, and thermal insulation in a slim package that could resemble existing solar panels, can provide up to about 19 degrees Fahrenheit (9.3 degrees Celsius) of cooling from the ambient temperature, enough to permit safe food storage for about 40 percent longer under very humid conditions. It could triple the safe storage time under dryer conditions.

The findings are reported in the journal Cell Reports Physical Science, in a paper by MIT postdoc Zhengmao Lu, Arny Leroy PhD ’21, professors Jeffrey Grossman and Evelyn Wang, and two others. While more research is needed in order to bring down the cost of one key component of the system, the researchers say that eventually such a system could play a significant role in meeting the cooling needs of many parts of the world where a lack of electricity or water limits the use of conventional cooling systems.

The system cleverly combines previous standalone cooling designs that each provide limited amounts of cooling power, in order to produce significantly more cooling overall — enough to help reduce food losses from spoilage in parts of the world that are already suffering from limited food supplies. In recognition of that potential, the research team has been partly supported by MIT’s Abdul Latif Jameel Water and Food Systems Lab.

“This technology combines some of the good features of previous technologies such as evaporative cooling and radiative cooling,” Lu says. By using this combination, he says, “we show that you can achieve significant food life extension, even in areas where you have high humidity,” which limits the capabilities of conventional evaporative or radiative cooling systems.

In places that do have existing air conditioning systems in buildings, the new system could be used to significantly reduce the load on these systems by sending cool water to the hottest part of the system, the condenser. “By lowering the condenser temperature, you can effectively increase the air conditioner efficiency, so that way you can potentially save energy,” Lu says.

Other groups have also been pursuing passive cooling technologies, he says, but “by combining those features in a synergistic way, we are now able to achieve high cooling performance, even in high-humidity areas where previous technology generally cannot perform well.”

The system consists of three layers of material, which together provide cooling as water and heat pass through the device. In practice, the device could resemble a conventional solar panel, but instead of putting out electricity, it would directly provide cooling, for example by acting as the roof of a food storage container. Or, it could be used to send chilled water through pipes to cool parts of an existing air conditioning system and improve its efficiency. The only maintenance required is adding water for the evaporation, but the consumption is so low that this need only be done about once every four days in the hottest, driest areas, and only once a month in wetter areas.

The top layer is an aerogel, a material consisting mostly of air enclosed in the cavities of a sponge-like structure made of polyethylene. The material is highly insulating but freely allows both water vapor and infrared radiation to pass through. The evaporation of water (rising up from the layer below) provides some of the cooling power, while the infrared radiation, taking advantage of the extreme transparency of Earth’s atmosphere at those wavelengths, radiates some of the heat straight up through the air and into space — unlike air conditioners, which spew hot air into the immediate surrounding environment.

Below the aerogel is a layer of hydrogel — another sponge-like material, but one whose pore spaces filled with water rather than air. It’s similar to material currently used commercially for products such as cooling pads or wound dressings. This provides the water source for evaporative cooling, as water vapor forms at its surface and the vapor passes up right through the aerogel layer and out to the environment.

Below that, a mirror-like layer reflects any incoming sunlight that has reached it, sending it back up through the device rather than letting it heat up the materials and thus reducing their thermal load. And the top layer of aerogel, being a good insulator, is also highly solar-reflecting, limiting the amount of solar heating of the device, even under strong direct sunlight.

“The novelty here is really just bringing together the radiative cooling feature, the evaporative cooling feature, and also the thermal insulation feature all together in one architecture,” Lu explains. The system was tested, using a small version, just 4 inches across, on the rooftop of a building at MIT, proving its effectiveness even during suboptimal weather conditions, Lu says, and achieving 9.3 C of cooling (18.7 F).

“The challenge previously was that evaporative materials often do not deal with solar absorption well,” Lu says. “With these other materials, usually when they’re under the sun, they get heated, so they are unable to get to high cooling power at the ambient temperature.”

The aerogel material’s properties are a key to the system’s overall efficiency, but that material at present is expensive to produce, as it requires special equipment for critical point drying (CPD) to remove solvents slowly from the delicate porous structure without damaging it. The key characteristic that needs to be controlled to provide the desired characteristics is the size of the pores in the aerogel, which is made by mixing the polyethylene material with solvents, allowing it to set like a bowl of Jell-O, and then getting the solvents out of it. The research team is currently exploring ways of either making this drying process more inexpensive, such as by using freeze-drying, or finding alternative materials that can provide the same insulating function at lower cost, such as membranes separated by an air gap.

While the other materials used in the system are readily available and relatively inexpensive, Lu says, “the aerogel is the only material that’s a product from the lab that requires further development in terms of mass production.” And it’s impossible to predict how long that development might take before this system can be made practical for widespread use, he says.

The research team included Lenan Zhang of MIT’s Department of Mechanical Engineering and Jatin Patil of the Department of Materials Science and Engineering.

Continue Reading

Science & Technology

Collaborative machine learning that preserves privacy

Researchers increase the accuracy and efficiency of a machine-learning method that safeguards user data

EP Staff

Published

on

Written by Adam Zewe, MIT News Office

Training a machine-learning model to effectively perform a task, such as image classification, involves showing the model thousands, millions, or even billions of example images. Gathering such enormous datasets can be especially challenging when privacy is a concern, such as with medical images. Researchers from MIT and the MIT-born startup DynamoFL have now taken one popular solution to this problem, known as federated learning, and made it faster and more accurate.

Federated learning is a collaborative method for training a machine-learning model that keeps sensitive user data private. Hundreds or thousands of users each train their own model using their own data on their own device. Then users transfer their models to a central server, which combines them to come up with a better model that it sends back to all users.

A collection of hospitals located around the world, for example, could use this method to train a machine-learning model that identifies brain tumors in medical images, while keeping patient data secure on their local servers.

But federated learning has some drawbacks. Transferring a large machine-learning model to and from a central server involves moving a lot of data, which has high communication costs, especially since the model must be sent back and forth dozens or even hundreds of times. Plus, each user gathers their own data, so those data don’t necessarily follow the same statistical patterns, which hampers the performance of the combined model. And that combined model is made by taking an average — it is not personalized for each user.

The researchers developed a technique that can simultaneously address these three problems of federated learning. Their method boosts the accuracy of the combined machine-learning model while significantly reducing its size, which speeds up communication between users and the central server. It also ensures that each user receives a model that is more personalized for their environment, which improves performance.

The researchers were able to reduce the model size by nearly an order of magnitude when compared to other techniques, which led to communication costs that were between four and six times lower for individual users. Their technique was also able to increase the model’s overall accuracy by about 10 percent.

“A lot of papers have addressed one of the problems of federated learning, but the challenge was to put all of this together. Algorithms that focus just on personalization or communication efficiency don’t provide a good enough solution. We wanted to be sure we were able to optimize for everything, so this technique could actually be used in the real world,” says Vaikkunth Mugunthan PhD ’22, lead author of a paper that introduces this technique.

Mugunthan wrote the paper with his advisor, senior author Lalana Kagal, a principal research scientist in the Computer Science and Artificial Intelligence Laboratory (CSAIL). The work will be presented at the European Conference on Computer Vision.

Cutting a model down to size

The system the researchers developed, called FedLTN, relies on an idea in machine learning known as the lottery ticket hypothesis. This hypothesis says that within very large neural network models there exist much smaller subnetworks that can achieve the same performance. Finding one of these subnetworks is akin to finding a winning lottery ticket. (LTN stands for “lottery ticket network.”)

Neural networks, loosely based on the human brain, are machine-learning models that learn to solve problems using interconnected layers of nodes, or neurons.

Finding a winning lottery ticket network is more complicated than a simple scratch-off. The researchers must use a process called iterative pruning. If the model’s accuracy is above a set threshold, they remove nodes and the connections between them (just like pruning branches off a bush) and then test the leaner neural network to see if the accuracy remains above the threshold.

Other methods have used this pruning technique for federated learning to create smaller machine-learning models which could be transferred more efficiently. But while these methods may speed things up, model performance suffers.

Mugunthan and Kagal applied a few novel techniques to accelerate the pruning process while making the new, smaller models more accurate and personalized for each user.

They accelerated pruning by avoiding a step where the remaining parts of the pruned neural network are “rewound” to their original values. They also trained the model before pruning it, which makes it more accurate so it can be pruned at a faster rate, Mugunthan explains.

To make each model more personalized for the user’s environment, they were careful not to prune away layers in the network that capture important statistical information about that user’s specific data. In addition, when the models were all combined, they made use of information stored in the central server so it wasn’t starting from scratch for each round of communication.

They also developed a technique to reduce the number of communication rounds for users with resource-constrained devices, like a smart phone on a slow network. These users start the federated learning process with a leaner model that has already been optimized by a subset of other users.

Winning big with lottery ticket networks

When they put FedLTN to the test in simulations, it led to better performance and reduced communication costs across the board. In one experiment, a traditional federated learning approach produced a model that was 45 megabytes in size, while their technique generated a model with the same accuracy that was only 5 megabytes. In another test, a state-of-the-art technique required 12,000 megabytes of communication between users and the server to train one model, whereas FedLTN only required 4,500 megabytes.

With FedLTN, the worst-performing clients still saw a performance boost of more than 10 percent. And the overall model accuracy beat the state-of-the-art personalization algorithm by nearly 10 percent, Mugunthan adds.

Now that they have developed and finetuned FedLTN, Mugunthan is working to integrate the technique into a federated learning startup he recently founded, DynamoFL.

Moving forward, he hopes to continue enhancing this method. For instance, the researchers have demonstrated success using datasets that had labels, but a greater challenge would be applying the same techniques to unlabeled data, he says.

Mugunthan is hopeful this work inspires other researchers to rethink how they approach federated learning.

“This work shows the importance of thinking about these problems from a holistic aspect, and not just individual metrics that have to be improved. Sometimes, improving one metric can actually cause a downgrade in the other metrics. Instead, we should be focusing on how we can improve a bunch of things together, which is really important if it is to be deployed in the real world,” he says.

Continue Reading

Science & Technology

Scientists identify a plant molecule that sops up iron-rich heme

The peptide is used by legumes to control nitrogen-fixing bacteria; it may also offer leads for treating patients with too much heme in their blood

EP Staff

Published

on

Written by Anne Trafton, MIT News Office

Symbiotic relationships between legumes and the bacteria that grow in their roots are critical for plant survival. Without those bacteria, the plants would have no source of nitrogen, an element that is essential for building proteins and other biomolecules, and they would be dependent on nitrogen fertilizer in the soil. 

To establish that symbiosis, some legume plants produce hundreds of peptides that help bacteria live within structures known as nodules within their roots. A new study from MIT reveals that one of these peptides has an unexpected function: It sops up all available heme, an iron-containing molecule. This sends the bacteria into an iron-starvation mode that ramps up their production of ammonia, the form of nitrogen that is usable for plants.

“This is the first of the 700 peptides in this system for which a really detailed molecular mechanism has been worked out,” says Graham Walker, the American Cancer Society Research Professor of Biology at MIT, a Howard Hughes Medical Institute Professor, and the senior author of the study.

This heme-sequestering peptide could have beneficial uses in treating a variety of human diseases, the researchers say. Removing free heme from the blood could help to treat diseases caused by bacteria or parasites that need heme to survive, such as P. gingivalis (periodontal disease) or toxoplasmosis, or diseases such as sickle cell disease or sepsis that release too much heme into the bloodstream.

“This study demonstrates that basic research in plant-microbe interactions also has potential to be translated to therapeutic applications,” says Siva Sankari, an MIT research scientist and the lead author of the study, which appears today in Nature Microbiology.

Other authors of the paper include Vignesh Babu, an MIT research scientist; Kevin Bian and Mary Andorfer, both MIT postdocs; Areej Alhhazmi, a former KACST-MIT Ibn Khaldun Fellowship for Saudi Arabian Women scholar; Kwan Yoon and Dante Avalos, MIT graduate students; Tyler Smith, an MIT instructor in biology; Catherine Drennan, an MIT professor of chemistry and biology and a Howard Hughes Medical Institute investigator; Michael Yaffe, a David H. Koch Professor of Science and a member of MIT’s Koch Institute for Integrative Cancer Research; and Sebastian Lourido, the Latham Family Career Development Professor of Biology at MIT and a member of the Whitehead Institute for Biomedical Research.

Iron control

For nearly 40 years, Walker’s lab has been studying the symbiosis between legumes and rhizobia, a type of nitrogen-fixing bacteria. These bacteria convert nitrogen gas to ammonia, a critical step of the Earth’s nitrogen cycle that makes the element available to plants (and to animals that eat the plants).

Most of Walker’s work has focused on a clover-like plant called Medicago truncatula. Nitrogen-fixing bacteria elicit the formation of nodules on the roots of these plants and eventually end up inside the plant cells, where they convert to their symbiotic form called bacteroids.

Several years ago, plant biologists discovered that Medicago truncatula produces about 700 peptides that contribute to the formation of these bacteroids. These peptides are generated in waves that help the bacteria make the transition from living freely to becoming embedded into plant cells where they act as nitrogen-fixing machines.

Walker and his students picked one of these peptides, known as NCR247, to dig into more deeply. Initial studies revealed that when nitrogen-fixing bacteria were exposed to this peptide, 15 percent of their genes were affected. Many of the genes that became more active were involved in importing iron.

The researchers then found that when they fused NCR247 to a larger protein, the hybrid protein was unexpectedly reddish in color. This serendipitous observation led to the discovery that NCR247 binds heme, an organic ring-shaped iron-containing molecule that is an important component of hemoglobin, the protein that red blood cells use to carry oxygen.

Further studies revealed that when NCR247 is released into bacterial cells, it sequesters most of the heme in the cell, sending the cells into an iron-starvation mode that triggers them to begin importing more iron from the external environment.

“Usually bacteria fine-tune their iron metabolism, and they don’t take up more iron when there is already enough,” Sankari says. “What’s cool about this peptide is that it overrides that mechanism and indirectly regulates the iron content of the bacteria.”

Nitrogenase, the main enzyme that bacteria use to fix nitrogen, requires 24 to 32 atoms of iron per enzyme molecule, so the influx of extra iron likely helps those enzymes to become more active, the researchers say. This influx is timed to coincide with nitrogen fixation, they found.


“These peptides are produced in a wave in the nodules, and the production of this particular peptide is higher when the bacteria are preparing to fix nitrogen. If this peptide was secreted throughout the whole process, then the cell would have too much iron all the time, which is bad for the cell,” Sankari says.

Without the NCR247 peptide, Medicago truncatula and rhizobium cannot form an effective nitrogen-fixing symbiosis, the researchers showed.

“Many possible directions”

The peptide that the researchers studied in this work may have potential therapeutic uses. When heme is incorporated into hemoglobin, it performs a critical function in the body, but when it’s loose in the bloodstream, it can kill cells and promote inflammation. Free heme can accumulate in stored blood, so having a way to filter out the heme before the blood is transfused into a patient could be potentially useful.

A variety of human diseases lead to free heme circulating in the bloodstream, including sickle cell anemia, sepsis, and malaria. Additionally, some infectious parasites and bacteria depend on heme for their survival but cannot produce it, so they scavenge it from their environment. Treating such infections with a protein that takes up all available heme could help prevent the parasitic or bacterial cells from being able to grow and reproduce.

In this study, Lourido and members of his lab showed that treating the parasite Toxoplasma gondii with NCR427 prevented the parasite from forming plaques on human cells.

The researchers are now pursuing collaborations with other labs at MIT to explore some of these potential applications, with funding from a Professor Amar G. Bose Research Grant.

“There are many possible directions, but they’re all at a very early stage,” Walker says. “The number of potential clinical applications is very broad. You can place more than one bet in this game, which is an intriguing thing.”

Currently, the human protein hemopexin, which also binds to heme, is being explored as a possible treatment for sickle cell anemia. The NCR247 peptide could provide an easier to deploy alternative, the researchers say, because it is much smaller and could be easier to manufacture and deliver into the body.

The research was funded in part by the MIT Center for Environmental Health Sciences, the National Science Foundation, and the National Institutes of Health.

Continue Reading

Trending