Written by Adam Zewe, MIT News Office
Microcontrollers, miniature computers that can run simple commands, are the basis for billions of connected devices, from internet-of-things (IoT) devices to sensors in automobiles. But cheap, low-power microcontrollers have extremely limited memory and no operating system, making it challenging to train artificial intelligence models on “edge devices” that work independently from central computing resources.
Training a machine-learning model on an intelligent edge device allows it to adapt to new data and make better predictions. For instance, training a model on a smart keyboard could enable the keyboard to continually learn from the user’s writing. However, the training process requires so much memory that it is typically done using powerful computers at a data center, before the model is deployed on a device. This is more costly and raises privacy issues since user data must be sent to a central server.
To address this problem, researchers at MIT and the MIT-IBM Watson AI Lab developed a new technique that enables on-device training using less than a quarter of a megabyte of memory. Other training solutions designed for connected devices can use more than 500 megabytes of memory, greatly exceeding the 256-kilobyte capacity of most microcontrollers (there are 1,024 kilobytes in one megabyte).
The intelligent algorithms and framework the researchers developed reduce the amount of computation required to train a model, which makes the process faster and more memory efficient. Their technique can be used to train a machine-learning model on a microcontroller in a matter of minutes.
This technique also preserves privacy by keeping data on the device, which could be especially beneficial when data are sensitive, such as in medical applications. It also could enable customization of a model based on the needs of users. Moreover, the framework preserves or improves the accuracy of the model when compared to other training approaches.
“Our study enables IoT devices to not only perform inference but also continuously update the AI models to newly collected data, paving the way for lifelong on-device learning. The low resource utilization makes deep learning more accessible and can have a broader reach, especially for low-power edge devices,” says Song Han, an associate professor in the Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, and senior author of the paper describing this innovation.
Joining Han on the paper are co-lead authors and EECS PhD students Ji Lin and Ligeng Zhu, as well as MIT postdocs Wei-Ming Chen and Wei-Chen Wang, and Chuang Gan, a principal research staff member at the MIT-IBM Watson AI Lab. The research will be presented at the Conference on Neural Information Processing Systems.
A common type of machine-learning model is known as a neural network. Loosely based on the human brain, these models contain layers of interconnected nodes, or neurons, that process data to complete a task, such as recognizing people in photos. The model must be trained first, which involves showing it millions of examples so it can learn the task. As it learns, the model increases or decreases the strength of the connections between neurons, which are known as weights.
The model may undergo hundreds of updates as it learns, and the intermediate activations must be stored during each round. In a neural network, activation is the middle layer’s intermediate results. Because there may be millions of weights and activations, training a model requires much more memory than running a pre-trained model, Han explains.
Han and his collaborators employed two algorithmic solutions to make the training process more efficient and less memory-intensive. The first, known as sparse update, uses an algorithm that identifies the most important weights to update at each round of training. The algorithm starts freezing the weights one at a time until it sees the accuracy dip to a set threshold, then it stops. The remaining weights are updated, while the activations corresponding to the frozen weights don’t need to be stored in memory.
“Updating the whole model is very expensive because there are a lot of activations, so people tend to update only the last layer, but as you can imagine, this hurts the accuracy. For our method, we selectively update those important weights and make sure the accuracy is fully preserved,” Han says.
Their second solution involves quantized training and simplifying the weights, which are typically 32 bits. An algorithm rounds the weights so they are only eight bits, through a process known as quantization, which cuts the amount of memory for both training and inference. Inference is the process of applying a model to a dataset and generating a prediction. Then the algorithm applies a technique called quantization-aware scaling (QAS), which acts like a multiplier to adjust the ratio between weight and gradient, to avoid any drop in accuracy that may come from quantized training.
The researchers developed a system, called a tiny training engine, that can run these algorithmic innovations on a simple microcontroller that lacks an operating system. This system changes the order of steps in the training process so more work is completed in the compilation stage, before the model is deployed on the edge device.
“We push a lot of the computation, such as auto-differentiation and graph optimization, to compile time. We also aggressively prune the redundant operators to support sparse updates. Once at runtime, we have much less workload to do on the device,” Han explains.
A successful speedup
Their optimization only required 157 kilobytes of memory to train a machine-learning model on a microcontroller, whereas other techniques designed for lightweight training would still need between 300 and 600 megabytes.
They tested their framework by training a computer vision model to detect people in images. After only 10 minutes of training, it learned to complete the task successfully. Their method was able to train a model more than 20 times faster than other approaches.
Now that they have demonstrated the success of these techniques for computer vision models, the researchers want to apply them to language models and different types of data, such as time-series data. At the same time, they want to use what they’ve learned to shrink the size of larger models without sacrificing accuracy, which could help reduce the carbon footprint of training large-scale machine-learning models.
This work is funded by the National Science Foundation, the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, Amazon, Intel, Qualcomm, Ford Motor Company, and Google.
A far-sighted approach to machine learning
New system can teach a group of cooperative or competitive AI agents to find an optimal long-term solution
Written by Adam Zewe, MIT News Office
Picture two teams squaring off on a football field. The players can cooperate to achieve an objective, and compete against other players with conflicting interests. That’s how the game works.
Creating artificial intelligence agents that can learn to compete and cooperate as effectively as humans remains a thorny problem. A key challenge is enabling AI agents to anticipate future behaviors of other agents when they are all learning simultaneously.
Because of the complexity of this problem, current approaches tend to be myopic; the agents can only guess the next few moves of their teammates or competitors, which leads to poor performance in the long run.
Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere have developed a new approach that gives AI agents a farsighted perspective. Their machine-learning framework enables cooperative or competitive AI agents to consider what other agents will do as time approaches infinity, not just over a few next steps. The agents then adapt their behaviors accordingly to influence other agents’ future behaviors and arrive at an optimal, long-term solution.
This framework could be used by a group of autonomous drones working together to find a lost hiker in a thick forest, or by self-driving cars that strive to keep passengers safe by anticipating future moves of other vehicles driving on a busy highway.
“When AI agents are cooperating or competing, what matters most is when their behaviors converge at some point in the future. There are a lot of transient behaviors along the way that don’t matter very much in the long run. Reaching this converged behavior is what we really care about, and we now have a mathematical way to enable that,” says Dong-Ki Kim, a graduate student in the MIT Laboratory for Information and Decision Systems (LIDS) and lead author of a paper describing this framework.
The senior author is Jonathan P. How, the Richard C. Maclaurin Professor of Aeronautics and Astronautics and a member of the MIT-IBM Watson AI Lab. Co-authors include others at the MIT-IBM Watson AI Lab, IBM Research, Mila-Quebec Artificial Intelligence Institute, and Oxford University. The research will be presented at the Conference on Neural Information Processing Systems.
More agents, more problems
The researchers focused on a problem known as multiagent reinforcement learning. Reinforcement learning is a form of machine learning in which an AI agent learns by trial and error. Researchers give the agent a reward for “good” behaviors that help it achieve a goal. The agent adapts its behavior to maximize that reward until it eventually becomes an expert at a task.
But when many cooperative or competing agents are simultaneously learning, things become increasingly complex. As agents consider more future steps of their fellow agents, and how their own behavior influences others, the problem soon requires far too much computational power to solve efficiently. This is why other approaches only focus on the short term.
“The AIs really want to think about the end of the game, but they don’t know when the game will end. They need to think about how to keep adapting their behavior into infinity so they can win at some far time in the future. Our paper essentially proposes a new objective that enables an AI to think about infinity,” says Kim.
But since it is impossible to plug infinity into an algorithm, the researchers designed their system so agents focus on a future point where their behavior will converge with that of other agents, known as equilibrium. An equilibrium point determines the long-term performance of agents, and multiple equilibria can exist in a multiagent scenario. Therefore, an effective agent actively influences the future behaviors of other agents in such a way that they reach a desirable equilibrium from the agent’s perspective. If all agents influence each other, they converge to a general concept that the researchers call an “active equilibrium.”
The machine-learning framework they developed, known as FURTHER (which stands for FUlly Reinforcing acTive influence witH averagE Reward), enables agents to learn how to adapt their behaviors as they interact with other agents to achieve this active equilibrium.
FURTHER does this using two machine-learning modules. The first, an inference module, enables an agent to guess the future behaviors of other agents and the learning algorithms they use, based solely on their prior actions.
This information is fed into the reinforcement learning module, which the agent uses to adapt its behavior and influence other agents in a way that maximizes its reward.
“The challenge was thinking about infinity. We had to use a lot of different mathematical tools to enable that, and make some assumptions to get it to work in practice,” Kim says.
Winning in the long run
They tested their approach against other multiagent reinforcement learning frameworks in several different scenarios, including a pair of robots fighting sumo-style and a battle pitting two 25-agent teams against one another. In both instances, the AI agents using FURTHER won the games more often.
Since their approach is decentralized, which means the agents learn to win the games independently, it is also more scalable than other methods that require a central computer to control the agents, Kim explains.
The researchers used games to test their approach, but FURTHER could be used to tackle any kind of multiagent problem. For instance, it could be applied by economists seeking to develop sound policy in situations where many interacting entitles have behaviors and interests that change over time.
Economics is one application Kim is particularly excited about studying. He also wants to dig deeper into the concept of an active equilibrium and continue enhancing the FURTHER framework.
This research is funded, in part, by the MIT-IBM Watson AI Lab.
Flocks of assembler robots show potential for making larger structures
Researchers make progress toward groups of robots that could build almost anything, including buildings, vehicles, and even bigger robots
Written by David L. Chandler, MIT News Office
Researchers at MIT have made significant steps toward creating robots that could practically and economically assemble nearly anything, including things much larger than themselves, from vehicles to buildings to larger robots.
The new work, from MIT’s Center for Bits and Atoms (CBA), builds on years of research, including recent studies demonstrating that objects such as a deformable airplane wing and a functional racing car could be assembled from tiny identical lightweight pieces — and that robotic devices could be built to carry out some of this assembly work. Now, the team has shown that both the assembler bots and the components of the structure being built can all be made of the same subunits, and the robots can move independently in large numbers to accomplish large-scale assemblies quickly.
The new work is reported in the journal Nature Communications Engineering, in a paper by CBA doctoral student Amira Abdel-Rahman, Professor and CBA Director Neil Gershenfeld, and three others.
A fully autonomous self-replicating robot assembly system capable of both assembling larger structures, including larger robots, and planning the best construction sequence is still years away, Gershenfeld says. But the new work makes important strides toward that goal, including working out the complex tasks of when to build more robots and how big to make them, as well as how to organize swarms of bots of different sizes to build a structure efficiently without crashing into each other.
As in previous experiments, the new system involves large, usable structures built from an array of tiny identical subunits called voxels (the volumetric equivalent of a 2-D pixel). But while earlier voxels were purely mechanical structural pieces, the team has now developed complex voxels that each can carry both power and data from one unit to the next. This could enable the building of structures that can not only bear loads but also carry out work, such as lifting, moving and manipulating materials — including the voxels themselves.
“When we’re building these structures, you have to build in intelligence,” Gershenfeld says. While earlier versions of assembler bots were connected by bundles of wires to their power source and control systems, “what emerged was the idea of structural electronics — of making voxels that transmit power and data as well as force.” Looking at the new system in operation, he points out, “There’s no wires. There’s just the structure.”
The robots themselves consist of a string of several voxels joined end-to-end. These can grab another voxel using attachment points on one end, then move inchworm-like to the desired position, where the voxel can be attached to the growing structure and released there.
Gershenfeld explains that while the earlier system demonstrated by members of his group could in principle build arbitrarily large structures, as the size of those structures reached a certain point in relation to the size of the assembler robot, the process would become increasingly inefficient because of the ever-longer paths each bot would have to travel to bring each piece to its destination. At that point, with the new system, the bots could decide it was time to build a larger version of themselves that could reach longer distances and reduce the travel time. An even bigger structure might require yet another such step, with the new larger robots creating yet larger ones, while parts of a structure that include lots of fine detail may require more of the smallest robots.
As these robotic devices work on assembling something, Abdel-Rahman says, they face choices at every step along the way: “It could build a structure, or it could build another robot of the same size, or it could build a bigger robot.” Part of the work the researchers have been focusing on is creating the algorithms for such decision-making.
“For example, if you want to build a cone or a half-sphere,” she says, “how do you start the path planning, and how do you divide this shape” into different areas that different bots can work on? The software they developed allows someone to input a shape and get an output that shows where to place the first block, and each one after that, based on the distances that need to be traversed.
There are thousands of papers published on route-planning for robots, Gershenfeld says. “But the step after that, of the robot having to make the decision to build another robot or a different kind of robot — that’s new. There’s really nothing prior on that.”
While the experimental system can carry out the assembly and includes the power and data links, in the current versions the connectors between the tiny subunits are not strong enough to bear the necessary loads. The team, including graduate student Miana Smith, is now focusing on developing stronger connectors. “These robots can walk and can place parts,” Gershenfeld says, “but we are almost — but not quite — at the point where one of these robots makes another one and it walks away. And that’s down to fine-tuning of things, like the force of actuators and the strength of joints. … But it’s far enough along that these are the parts that will lead to it.”
Ultimately, such systems might be used to construct a wide variety of large, high-value structures. For example, currently the way airplanes are built involves huge factories with gantries much larger than the components they build, and then “when you make a jumbo jet, you need jumbo jets to carry the parts of the jumbo jet to make it,” Gershenfeld says. With a system like this built up from tiny components assembled by tiny robots, “The final assembly of the airplane is the only assembly.”
Similarly, in producing a new car, “you can spend a year on tooling” before the first car gets actually built, he says. The new system would bypass that whole process. Such potential efficiencies are why Gershenfeld and his students have been working closely with car companies, aviation companies, and NASA. But even the relatively low-tech building construction industry could potentially also benefit.
While there has been increasing interest in 3-D-printed houses, today those require printing machinery as large or larger than the house being built. Again, the potential for such structures to instead be assembled by swarms of tiny robots could provide benefits. And the Defense Advanced Research Projects Agency is also interested in the work for the possibility of building structures for coastal protection against erosion and sea level rise.
The research team also included MIT-CBA student Benjamin Jenett and Christopher Cameron, who is now at the U.S. Army Research Laboratory. The work was supported by NASA, the U.S. Army Research Laboratory, and CBA consortia funding.
Study: Automation drives income inequality
New data suggest most of the growth in the wage gap since 1980 comes from automation displacing less-educated workers
Written by Peter Dizikes, MIT News
When you use self-checkout machines in supermarkets and drugstores, you are probably not — with all due respect — doing a better job of bagging your purchases than checkout clerks once did. Automation just makes bagging less expensive for large retail chains.
“If you introduce self-checkout kiosks, it’s not going to change productivity all that much,” says MIT economist Daron Acemoglu. However, in terms of lost wages for employees, he adds, “It’s going to have fairly large distributional effects, especially for low-skill service workers. It’s a labor-shifting device, rather than a productivity-increasing device.”
A newly published study co-authored by Acemoglu quantifies the extent to which automation has contributed to income inequality in the U.S., simply by replacing workers with technology — whether self-checkout machines, call-center systems, assembly-line technology, or other devices. Over the last four decades, the income gap between more- and less-educated workers has grown significantly; the study finds that automation accounts for more than half of that increase.
“This single one variable … explains 50 to 70 percent of the changes or variation between group inequality from 1980 to about 2016,” Acemoglu says.
The paper, “Tasks, Automation, and the Rise in U.S. Wage Inequality,” is being published in Econometrica. The authors are Acemoglu, who is an Institute Professor at MIT, and Pascual Restrepo PhD ’16, an assistant professor of economics at Boston University.
So much “so-so automation”
Since 1980 in the U.S., inflation-adjusted incomes of those with college and postgraduate degrees have risen substantially, while inflation-adjusted earnings of men without high school degrees has dropped by 15 percent.
How much of this change is due to automation? Growing income inequality could also stem from, among other things, the declining prevalence of labor unions, market concentration begetting a lack of competition for labor, or other types of technological change.
To conduct the study, Acemoglu and Restrepo used U.S. Bureau of Economic Analysis statistics on the extent to which human labor was used in 49 industries from 1987 to 2016, as well as data on machinery and software adopted in that time. The scholars also used data they had previously compiled about the adoption of robots in the U.S. from 1993 to 2014. In previous studies, Acemoglu and Restrepo have found that robots have by themselves replaced a substantial number of workers in the U.S., helped some firms dominate their industries, and contributed to inequality.
At the same time, the scholars used U.S. Census Bureau metrics, including its American Community Survey data, to track worker outcomes during this time for roughly 500 demographic subgroups, broken out by gender, education, age, race and ethnicity, and immigration status, while looking at employment, inflation-adjusted hourly wages, and more, from 1980 to 2016. By examining the links between changes in business practices alongside changes in labor market outcomes, the study can estimate what impact automation has had on workers.
Ultimately, Acemoglu and Restrepo conclude that the effects have been profound. Since 1980, for instance, they estimate that automation has reduced the wages of men without a high school degree by 8.8 percent and women without a high school degree by 2.3 percent, adjusted for inflation.
A central conceptual point, Acemoglu says, is that automation should be regarded differently from other forms of innovation, with its own distinct effects in workplaces, and not just lumped in as part of a broader trend toward the implementation of technology in everyday life generally.
Consider again those self-checkout kiosks. Acemoglu calls these types of tools “so-so technology,” or “so-so automation,” because of the tradeoffs they contain: Such innovations are good for the corporate bottom line, bad for service-industry employees, and not hugely important in terms of overall productivity gains, the real marker of an innovation that may improve our overall quality of life.
“Technological change that creates or increases industry productivity, or productivity of one type of labor, creates [those] large productivity gains but does not have huge distributional effects,” Acemoglu says. “In contrast, automation creates very large distributional effects and may not have big productivity effects.”
A new perspective on the big picture
The results occupy a distinctive place in the literature on automation and jobs. Some popular accounts of technology have forecast a near-total wipeout of jobs in the future. Alternately, many scholars have developed a more nuanced picture, in which technology disproportionately benefits highly educated workers but also produces significant complementarities between high-tech tools and labor.
The current study differs at least by degree with this latter picture, presenting a more stark outlook in which automation reduces earnings power for workers and potentially reduces the extent to which policy solutions — more bargaining power for workers, less market concentration — could mitigate the detrimental effects of automation upon wages.
“These are controversial findings in the sense that they imply a much bigger effect for automation than anyone else has thought, and they also imply less explanatory power for other [factors],” Acemoglu says.
Still, he adds, in the effort to identify drivers of income inequality, the study “does not obviate other nontechnological theories completely. Moreover, the pace of automation is often influenced by various institutional factors, including labor’s bargaining power.”
Labor economists say the study is an important addition to the literature on automation, work, and inequality, and should be reckoned with in future discussions of these issues.
For their part, in the paper Acemoglu and Restrepo identify multiple directions for future research. That includes investigating the reaction over time by both business and labor to the increase in automation; the quantitative effects of technologies that do create jobs; and the industry competition between firms that quickly adopted automation and those that did not.
The research was supported in part by Google, the Hewlett Foundation, Microsoft, the National Science Foundation, Schmidt Sciences, the Sloan Foundation, and the Smith Richardson Foundation.
Business & Economy7 months ago
NSE Academy Limited collaborates with HDFC Mutual Fund for financial awareness program
Edu News7 months ago
Technique protects privacy when making online recommendations
Business & Economy4 months ago
Using artificial intelligence to control digital manufacturing
Edu News7 months ago
Search reveals eight new sources of black hole echoes
Edu News6 months ago
Stronger security for smart devices
Edu News6 months ago
Astronomers discover a multiplanet system nearby
Edu News5 months ago
Jasudben ML School celebrated its first edition of Pride Month
Edu News7 months ago
Unpacking black-box models