Connect with us

Science & Technology

Solving a machine-learning mystery

A new study shows how large language models like GPT-3 can learn a new task from just a few examples, without the need for any new training data

EP Staff

Published

on

Written by Adam Zewe, MIT News Office

Large language models like OpenAI’s GPT-3 are massive neural networks that can generate human-like text, from poetry to programming code. Trained using troves of internet data, these machine-learning models take a small bit of input text and then predict the text that is likely to come next.

But that’s not all these models can do. Researchers are exploring a curious phenomenon known as in-context learning, in which a large language model learns to accomplish a task after seeing only a few examples — despite the fact that it wasn’t trained for that task. For instance, someone could feed the model several example sentences and their sentiments (positive or negative), then prompt it with a new sentence, and the model can give the correct sentiment.

Typically, a machine-learning model like GPT-3 would need to be retrained with new data for this new task. During this training process, the model updates its parameters as it processes new information to learn the task. But with in-context learning, the model’s parameters aren’t updated, so it seems like the model learns a new task without learning anything at all.

Scientists from MIT, Google Research, and Stanford University are striving to unravel this mystery. They studied models that are very similar to large language models to see how they can learn without updating parameters.

The researchers’ theoretical results show that these massive neural network models are capable of containing smaller, simpler linear models buried inside them. The large model could then implement a simple learning algorithm to train this smaller, linear model to complete a new task, using only information already contained within the larger model. Its parameters remain fixed.

An important step toward understanding the mechanisms behind in-context learning, this research opens the door to more exploration around the learning algorithms these large models can implement, says Ekin Akyürek, a computer science graduate student and lead author of a paper exploring this phenomenon. With a better understanding of in-context learning, researchers could enable models to complete new tasks without the need for costly retraining.

“Usually, if you want to fine-tune these models, you need to collect domain-specific data and do some complex engineering. But now we can just feed it an input, five examples, and it accomplishes what we want. So in-context learning is a pretty exciting phenomenon,” Akyürek says.

Joining Akyürek on the paper are Dale Schuurmans, a research scientist at Google Brain and professor of computing science at the University of Alberta; as well as senior authors Jacob Andreas, the X Consortium Assistant Professor in the MIT Department of Electrical Engineering and Computer Science and a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL); Tengyu Ma, an assistant professor of computer science and statistics at Stanford; and Danny Zhou, principal scientist and research director at Google Brain. The research will be presented at the International Conference on Learning Representations.

A model within a model

In the machine-learning research community, many scientists have come to believe that large language models can perform in-context learning because of how they are trained, Akyürek says.

For instance, GPT-3 has hundreds of billions of parameters and was trained by reading huge swaths of text on the internet, from Wikipedia articles to Reddit posts. So, when someone shows the model examples of a new task, it has likely already seen something very similar because its training dataset included text from billions of websites. It repeats patterns it has seen during training, rather than learning to perform new tasks.

Akyürek hypothesized that in-context learners aren’t just matching previously seen patterns, but instead are actually learning to perform new tasks. He and others had experimented by giving these models prompts using synthetic data, which they could not have seen anywhere before, and found that the models could still learn from just a few examples. Akyürek and his colleagues thought that perhaps these neural network models have smaller machine-learning models inside them that the models can train to complete a new task.

“That could explain almost all of the learning phenomena that we have seen with these large models,” he says.

To test this hypothesis, the researchers used a neural network model called a transformer, which has the same architecture as GPT-3, but had been specifically trained for in-context learning.

By exploring this transformer’s architecture, they theoretically proved that it can write a linear model within its hidden states. A neural network is composed of many layers of interconnected nodes that process data. The hidden states are the layers between the input and output layers.

Their mathematical evaluations show that this linear model is written somewhere in the earliest layers of the transformer. The transformer can then update the linear model by implementing simple learning algorithms.

In essence, the model simulates and trains a smaller version of itself.

Probing hidden layers

The researchers explored this hypothesis using probing experiments, where they looked in the transformer’s hidden layers to try and recover a certain quantity.

“In this case, we tried to recover the actual solution to the linear model, and we could show that the parameter is written in the hidden states. This means the linear model is in there somewhere,” he says.

Building off this theoretical work, the researchers may be able to enable a transformer to perform in-context learning by adding just two layers to the neural network. There are still many technical details to work out before that would be possible, Akyürek cautions, but it could help engineers create models that can complete new tasks without the need for retraining with new data.

Moving forward, Akyürek plans to continue exploring in-context learning with functions that are more complex than the linear models they studied in this work. They could also apply these experiments to large language models to see whether their behaviors are also described by simple learning algorithms. In addition, he wants to dig deeper into the types of pretraining data that can enable in-context learning.

“With this work, people can now visualize how these models can learn from exemplars. So, my hope is that it changes some people’s views about in-context learning,” Akyürek says. “These models are not as dumb as people think. They don’t just memorize these tasks. They can learn new tasks, and we have shown how that can be done.”

Science & Technology

3D-printed revolving devices can sense how they are moving

A new system enables makers to incorporate sensors into gears and other rotational mechanisms with just one pass in a 3D printer

EP Staff

Published

on

**Images/video: https://news.mit.edu/2023/3d-printing-revolving-devices-sensors-0316

Written by Adam Zewe, MIT News Office

Integrating sensors into rotational mechanisms could make it possible for engineers to build smart hinges that know when a door has been opened, or gears inside a motor that tell a mechanic how fast they are rotating. MIT engineers have now developed a way to easily integrate sensors into these types of mechanisms, with 3D printing.

The researchers built a plugin for the computer-aided design software SolidWorks that automatically integrates sensors into a model of the mechanism, which could then be sent directly to the 3D printer for fabrication. Here, they used the system to design a wheel that measures distance as it rolls across a surface.
Credits:Credit: Courtesy of the researchers. Edited by MIT News

Even though advances in 3D printing enable rapid fabrication of rotational mechanisms, integrating sensors into the designs is still notoriously difficult. Due to the complexity of the rotating parts, sensors are typically embedded manually, after the device has already been produced.

However, manually integrating sensors is no easy task. Embed them inside a device and wires might get tangled in the rotating parts or obstruct their rotations, but mounting external sensors would increase the size of a mechanism and potentially limit its motion.

Instead, the new system the MIT researchers developed enables a maker to 3D print sensors directly into a mechanism’s moving parts using conductive 3D printing filament. This gives devices the ability to sense their angular position, rotation speed, and direction of rotation.

With their system, called MechSense, a maker can manufacture rotational mechanisms with integrated sensors in just one pass using a multi-material 3D printer. These types of printers utilize multiple materials at the same time to fabricate a device.

To streamline the fabrication process, the researchers built a plugin for the computer-aided design software SolidWorks that automatically integrates sensors into a model of the mechanism, which could then be sent directly to the 3D printer for fabrication.

MechSense could enable engineers to rapidly prototype devices with rotating parts, like turbines or motors, while incorporating sensing directly into the designs. It could be especially useful in creating tangible user interfaces for augmented reality environments, where sensing is critical for tracking a user’s movements and interaction with objects.

“A lot of the research that we do in our lab involves taking fabrication methods that factories or specialized institutions create and then making then accessible for people. 3D printing is a tool that a lot of people can afford to have in their homes. So how can we provide the average maker with the tools necessary to develop these types of interactive mechanisms? At the end of the day, this research all revolves around that goal,” says Marwa AlAlawi, a mechanical engineering graduate student and lead author of a paper on MechSense.

AlAlawi’s co-authors include Michael Wessely, a former postdoc in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) who is now an assistant professor at Aarhus University; and senior author Stefanie Mueller, an associate professor in the MIT departments of Electrical Engineering and Computer Science and Mechanical Engineering, and a member CSAIL; as well as others at MIT and collaborators from Accenture Labs. The research will be presented at the ACM CHI Conference on Human Factors in Computing Systems.

Built-in sensing

To incorporate sensors into a rotational mechanism in a way that would not disrupt the device’s movement, the researchers leveraged capacitive sensing.

A capacitor consists of two plates of conductive material that have an insulating material sandwiched between them. If the overlapping area or distance between the conductive plates is changed, perhaps by rotating the mechanism, a capacitive sensor can detect resulting changes in the electric field between the plates. That information could then be used to calculate speed, for instance.

“In capacitive sensing, you don’t necessarily need to have contact between the two opposing conductive plates to monitor changes in that specific sensor. We took advantage of that for our sensor design,” AlAlawi says.

Rotational mechanisms typically consist of a rotational element located above, below, or next to a stationary element, like a gear spinning on a static shaft above a flat surface. The spinning gear is the rotational element and the flat surface beneath it is the stationary element.

The MechSense sensor includes three patches made from conductive material that are printed into the stationary plate, with each patch separated from its neighbors by nonconductive material. A fourth patch of conductive material, which has the same area as the other three patches, is printed into the rotating plate.

As the device spins, the patch on the rotating plate, called a floating capacitor, overlaps each of the patches on the stationary plate in turn. As the overlap between the rotating patch and each stationary patch changes (from completely covered, to half covered, to not covered at all), each patch individually detects the resulting change in capacitance.

The floating capacitor is not connected to any circuitry, so wires won’t get tangled with rotating components. 

Rather, the stationary patches are wired to electronics that use software the researchers developed to convert raw sensor data into estimations of angular position, direction of rotation, and rotation speed.

Enabling rapid prototyping

To simplify the sensor integration process for a user, the researchers built a SolidWorks extension. A maker specifies the rotating and stationary parts of their mechanism, as well as the center of rotation, and then the system automatically adds sensor patches to the model.

“It doesn’t change the design at all. It just replaces part of the device with a different material, in this case conductive material,” AlAlawi says.

The researchers used their system to prototype several devices, including a smart desk lamp that changes the color and brightness of its light depending on how the user rotates the bottom or middle of the lamp. They also produced a planetary gearbox, like those that are used in robotic arms, and a wheel that measures distance as it rolls across a surface.

As they prototyped, the team also conducted technical experiments to fine-tune their sensor design. They found that, as they reduced the size of the patches, the amount of error in the sensor data increased.

“In an effort to generate electronic devices with very little e-waste, we want devices with smaller footprints that can still perform well. If we take our same approach and perhaps use a different material or manufacturing process, I think we can scale down while accumulating less error using the same geometry,” she says.

In addition to testing different materials, AlAlawi and her collaborators plan to explore how they could increase the robustness of their sensor design to external noise, and also develop printable sensors for other types of moving mechanisms.

This research was funded, in part, by Accenture Labs.

Continue Reading

Science & Technology

Where the sidewalk ends

Most cities don’t map their own pedestrian networks. Now, researchers have built the first open-source tool to let planners do just that

Avatar

Published

on

Written by Peter Dizikes, MIT News Office

It’s easier than ever to view maps of any place you’d like to go — by car, that is. By foot is another matter. Most cities and towns in the U.S. do not have sidewalk maps, and pedestrians are usually left to fend for themselves: Can you walk from your hotel to the restaurants on the other side of the highway? Is there a shortcut from downtown to the sports arena? And how do you get to that bus stop, anyway?  

Now MIT researchers, along with colleagues from multiple other universities, have developed an open-source tool that uses aerial imagery and image-recognition to create complete maps of sidewalks and crosswalks. The tool can help planners, policymakers, and urbanists who want to expand pedestrian infrastructure.

“In the urban planning and urban policy fields, this is a huge gap,” says Andres Sevtsuk, an associate professor at MIT and a co-author of a new paper detailing the tool’s capabilities. “Most U.S. city governments know very little about their sidewalk networks. There is no data on it. The private sector hasn’t taken on the task of mapping it. It seemed like a really important technology to develop, especially in an open-source way that can be used by other places.”

The tool, called TILE2NET, has been developed using a few U.S. areas as initial sources of data, but it can be refined and adapted for use anywhere.

“We thought we needed a method that can be scalable and used in different cities,” says Maryam Hosseini, a postdoc in MIT’s City Form Lab in the Department of Urban Studies and Planning (DUSP), whose research has focused extensively on the development of the tool.

The paper, “Mapping the Walk: A Scalable Computer Vision Approach for Generating Sidewalk Network Datasets from Aerial Imagery,” appears online in the journal Computers, Environment and Urban Systems. The authors are Hosseini; Sevtsuk, who is the Charles and Ann Spaulding Career Development Associate Professor of Urban Science and Planning in DUSP and head of MIT’s City Form Lab; Fabio Miranda, an assistant professor of computer science at the University of Illinois at Chicago; Roberto M. Cesar, a professor of computer science at the University of Sao Paulo; and Claudio T. Silva, Institute Professor of Computer Science and Engineering at New York University (NYU) Tandon School of Engineering, and professor of data science at the NYU Center for Data Science.

Significant research for the project was conducted at NYU when Hosseini was a student there, working with Silva as a co-advisor.

There are multiple ways to attempt to map sidewalks and other pedestrian pathways in cities and towns. Planners could make maps manually, which is accurate but time-consuming; or they could use roads and make assumptions about the extent of sidewalks, which would reduce accuracy; or they could try tracking pedestrians, which probably would be limited in showing the full reach of walking networks.

Instead, the research team used computerized image-recognition techniques to build a tool that will visually recognize sidewalks, crosswalks, and footpaths. To do that, the researchers first used 20,000 aerial images from Boston, Cambridge, New York City, and Washington — places where comprehensive pedestrian maps already existed. By training the image-recognition model on such clearly defined objects and using portions of those cities as a starting point, they were able to see how well TILE2NET would work elsewhere in those cities.

Ultimately the tool worked well, recognizing 90 percent or more of all sidewalks and crosswalks in Boston and Cambridge, for instance. Having been trained visually on those cities, the tool can be applied to other metro areas; people elsewhere can now plug their aerial imagery into TILE2NET as well.

“We wanted to make it easier for cities in different parts of the world to do such a thing without needing to do the heavy lifting of training [the tool],” says Hosseini. “Collaboratively we will make it better and better, hopefully, as we go along.”

The need for such a tool is vast, emphasizes Sevtsuk, whose research centers on pedestrian and nonmotorized movement in cities, and who has developed multiple kinds of pedestrian-mapping tools in his career. Most cities have wildly incomplete networks of sidewalks and paths for pedestrians, he notes. And yet it is hard to expand those networks efficiently without mapping them.

“Imagine that we had the same gaps in car networks that pedestrians have in their networks,” Sevtsuk says. “You would drive to an intersection and then the road just ends. Or you can’t take a right turn since there is no road. That’s what [pedestrians] are constantly up against, and we don’t realize how important continuity is for [pedestrian] networks.”

In the still larger picture, Sevtsuk observes, the continuation of climate change means that cities will have to expand their infrastructure for pedestrians and cyclists, among other measures; transportation remains a huge source of carbon dioxide emissions.

“When cities talk about cutting carbon emissions, there’s no other way to make a big dent than to address transportation,” Sevtsuk says. “The whole world of urban data for public transit and pedestrians and bicycles is really far behind [vehicle data] in quality. Analyzing how cities can be operational without a car requires this kind of data.”

On the bright side, Sevtsuk suggests, adding pedestrian and bike infrastructure “is being done more aggressively than in many decades in the past. In the 20th century, it was the other way around, we would take away sidewalks to make space for vehicular roads. We’re now seeing the opposite trend. To make best use of pedestrian infrastructure, it’s important that cities have the network data about it. Now you can truly tell how somebody can get to a bus stop.”

Continue Reading

Science & Technology

Low-cost device can measure air pollution anywhere

Open-source tool from MIT’s Senseable City Lab lets people check air quality, cheaply

Avatar

Published

on

Written by Peter Dizikes, MIT News Office

Air pollution is a major public health problem: The World Health Organization has estimated that it leads to over 4 million premature deaths worldwide annually. Still, it is not always extensively measured. But now an MIT research team is rolling out an open-source version of a low-cost, mobile pollution detector that could enable people to track air quality more widely.

The detector, called Flatburn, can be made by 3D printing or by ordering inexpensive parts. The researchers have now tested and calibrated it in relation to existing state-of-the-art machines, and are publicly releasing all the information about it — how to build it, use it, and interpret the data.

“The goal is for community groups or individual citizens anywhere to be able to measure local air pollution, identify its sources, and, ideally, create feedback loops with officials and stakeholders to create cleaner conditions,” says Carlo Ratti, director of MIT’s Senseable City Lab. 

“We’ve been doing several pilots around the world, and we have refined a set of prototypes, with hardware, software, and protocols, to make sure the data we collect are robust from an environmental science point of view,” says Simone Mora, a research scientist at Senseable City Lab and co-author of a newly published paper detailing the scanner’s testing process. The Flatburn device is part of a larger project, known as City Scanner, using mobile devices to better understand urban life.

“Hopefully with the release of the open-source Flatburn we can get grassroots groups, as well as communities in less developed countries, to follow our approach and build and share knowledge,” says An Wang, a researcher at Senseable City Lab and another of the paper’s co-authors.

The paper, “Leveraging Machine Learning Algorithms to Advance Low-Cost Air Sensor Calibration in Stationary and Mobile Settings,” appears in the journal Atmospheric Environment.

In addition to Wang, Mora, and Ratti the study’s authors are: Yuki Machida, a former research fellow at Senseable City Lab; Priyanka deSouza, an assistant professor of urban and regional planning at the University of Colorado at Denver; Tiffany Duhl, a researcher with the Massachusetts Department of Environmental Protection and a Tufts University research associate at the time of the project; Neelakshi Hudda, a research assistant professor at Tufts University; John L. Durant, a professor of civil and environmental engineering at Tufts University; and Fabio Duarte, principal research scientist at Senseable City Lab.

The Flatburn concept at Senseable City Lab dates back to about 2017, when MIT researchers began prototyping a mobile pollution detector, originally to be deployed on garbage trucks in Cambridge, Massachusetts. The detectors are battery-powered and rechargable, either from power sources or a solar panel, with data stored on a card in the device that can be accessed remotely.

The current extension of that project involved testing the devices in New York City and the Boston area, by seeing how they performed in comparison to already-working pollution detection systems. In New York, the researchers used 5 detectors to collect 1.6 million data points over four weeks in 2021, working with state officials to compare the results. In Boston, the team used mobile sensors, evaluating the Flatburn devices against a state-of-the-art system deployed by Tufts University along with a state agency.

In both cases, the detectors were set up to measure concentrations of fine particulate matter as well as nitrogen dioxide, over an area of about 10 meters. Fine particular matter refers to tiny particles often associated with burning matter, from power plants, internal combustion engines in autos and fires, and more.

The research team found that the mobile detectors estimated somewhat lower concentrations of fine particulate matter than the devices already in use, but with a strong enough correlation so that, with adjustments for weather conditions and other factors, the Flatburn devices can produce reliable results.

“After following their deployment for a few months we can confidently say our low-cost monitors should behave the same way [as standard detectors],” Wang says. “We have a big vision, but we still have to make sure the data we collect is valid and can be used for regulatory and policy purposes,”

Duarte adds: “If you follow these procedures with low-cost sensors you can still acquire good enough data to go back to [environmental] agencies with it, and say, ‘Let’s talk.’”

The researchers did find that using the units in a mobile setting — on top of automobiles — means they will currently have an operating life of six months. They also identified a series of potential issues that people will have to deal with when using the Flatburn detectors generally. These include what the research team calls “drift,” the gradual changing of the detector’s readings over time, as well as “aging,” the more fundamental deterioration in a unit’s physical condition.

Still, the researchers believe the units will function well, and they are providing complete instructions in their release of Flatburn as an open-source tool. That even includes guidance for working with officials, communities, and stakeholders to process the results and attempt to shape action.

“It’s very important to engage with communities, to allow them to reflect on sources of pollution,” says Mora. 

“The original idea of the project was to democratize environmental data, and that’s still the goal,” Duarte adds. “We want people to have the skills to analyze the data and engage with communities and officials.”

Continue Reading

Trending