Science & Technology
When should data scientists try a new technique?
A new measure can help scientists decide which estimation method to use when modeling a particular data problem
Written by Adam Zewe, MIT News Office
If a scientist wanted to forecast ocean currents to understand how pollution travels after an oil spill, she could use a common approach that looks at currents traveling between 10 and 200 kilometers. Or, she could choose a newer model that also includes shorter currents. This might be more accurate, but it could also require learning new software or running new computational experiments. How to know if it will be worth the time, cost, and effort to use the new method?
A new approach developed by MIT researchers could help data scientists answer this question, whether they are looking at statistics on ocean currents, violent crime, children’s reading ability, or any number of other types of datasets.
The team created a new measure, known as the “c-value,” that helps users choose between techniques based on the chance that a new method is more accurate for a specific dataset. This measure answers the question “is it likely that the new method is more accurate for this data than the common approach?”
Traditionally, statisticians compare methods by averaging a method’s accuracy across all possible datasets. But just because a new method is better for all datasets on average doesn’t mean it will actually provide a better estimate using one particular dataset. Averages are not application-specific.
So, researchers from MIT and elsewhere created the c-value, which is a dataset-specific tool. A high c-value means it is unlikely a new method will be less accurate than the original method on a specific data problem.
In their proof-of-concept paper, the researchers describe and evaluate the c-value using real-world data analysis problems: modeling ocean currents, estimating violent crime in neighborhoods, and approximating student reading ability at schools. They show how the c-value could help statisticians and data analysts achieve more accurate results by indicating when to use alternative estimation methods they otherwise might have ignored.
“What we are trying to do with this particular work is come up with something that is data specific. The classical notion of risk is really natural for someone developing a new method. That person wants their method to work well for all of their users on average. But a user of a method wants something that will work on their individual problem. We’ve shown that the c-value is a very practical proof-of-concept in that direction,” says senior author Tamara Broderick, an associate professor in the Department of Electrical Engineering and Computer Science (EECS) and a member of the Laboratory for Information and Decision Systems and the Institute for Data, Systems, and Society.
She’s joined on the paper by Brian Trippe PhD ’22, a former graduate student in Broderick’s group who is now a postdoc at Columbia University; and Sameer Deshpande ’13, a former postdoc in Broderick’s group who is now an assistant professor at the University of Wisconsin at Madison. An accepted version of the paper is posted online in the Journal of the American Statistical Association.
The c-value is designed to help with data problems in which researchers seek to estimate an unknown parameter using a dataset, such as estimating average student reading ability from a dataset of assessment results and student survey responses. A researcher has two estimation methods and must decide which to use for this particular problem.
The better estimation method is the one that results in less “loss,” which means the estimate will be closer to the ground truth. Conder again the forecasting of ocean currents: Perhaps being off by a few meters per hour isn’t so bad, but being off by many kilometers per hour makes the estimate useless. The ground truth is unknown, though; the scientist is trying to estimate it. Therefore, one can never actually compute the loss of an estimate for their specific data. That’s what makes comparing estimates challenging. The c-value helps a scientist navigate this challenge.
The c-value equation uses a specific dataset to compute the estimate with each method, and then once more to compute the c-value between the methods. If the c-value is large, it is unlikely that the alternative method is going to be worse and yield less accurate estimates than the original method.
“In our case, we are assuming that you conservatively want to stay with the default estimator, and you only want to go to the new estimator if you feel very confident about it. With a high c-value, it’s likely that the new estimate is more accurate. If you get a low c-value, you can’t say anything conclusive. You might have actually done better, but you just don’t know,” Broderick explains.
Probing the theory
The researchers put that theory to the test by evaluating three real-world data analysis problems.
For one, they used the c-value to help determine which approach is best for modeling ocean currents, a problem Trippe has been tackling. Accurate models are important for predicting the dispersion of contaminants, like pollution from an oil spill. The team found that estimating ocean currents using multiple scales, one larger and one smaller, likely yields higher accuracy than using only larger scale measurements.
“Oceans researchers are studying this, and the c-value can provide some statistical ‘oomph’ to support modeling the smaller scale,” Broderick says.
In another example, the researchers sought to predict violent crime in census tracts in Philadelphia, an application Deshpande has been studying. Using the c-value, they found that one could get better estimates about violent crime rates by incorporating information about census-tract-level nonviolent crime into the analysis. They also used the c-value to show that additionally leveraging violent crime data from neighboring census tracts in the analysis isn’t likely to provide further accuracy improvements.
“That doesn’t mean there isn’t an improvement, that just means that we don’t feel confident saying that you will get it,” she says.
Now that they have proven the c-value in theory and shown how it could be used to tackle real-world data problems, the researchers want to expand the measure to more types of data and a wider set of model classes.
The ultimate goal is to create a measure that is general enough for many more data analysis problems, and while there is still a lot of work to do to realize that objective, Broderick says this is an important and exciting first step in the right direction.
This research was supported, in part, by an Advanced Research Projects Agency-Energy grant, a National Science Foundation CAREER Award, the Office of Naval Research, and the Wisconsin Alumni Research Foundation.
Science & Technology
3D-printed revolving devices can sense how they are moving
A new system enables makers to incorporate sensors into gears and other rotational mechanisms with just one pass in a 3D printer
Written by Adam Zewe, MIT News Office
Integrating sensors into rotational mechanisms could make it possible for engineers to build smart hinges that know when a door has been opened, or gears inside a motor that tell a mechanic how fast they are rotating. MIT engineers have now developed a way to easily integrate sensors into these types of mechanisms, with 3D printing.
Even though advances in 3D printing enable rapid fabrication of rotational mechanisms, integrating sensors into the designs is still notoriously difficult. Due to the complexity of the rotating parts, sensors are typically embedded manually, after the device has already been produced.
However, manually integrating sensors is no easy task. Embed them inside a device and wires might get tangled in the rotating parts or obstruct their rotations, but mounting external sensors would increase the size of a mechanism and potentially limit its motion.
Instead, the new system the MIT researchers developed enables a maker to 3D print sensors directly into a mechanism’s moving parts using conductive 3D printing filament. This gives devices the ability to sense their angular position, rotation speed, and direction of rotation.
With their system, called MechSense, a maker can manufacture rotational mechanisms with integrated sensors in just one pass using a multi-material 3D printer. These types of printers utilize multiple materials at the same time to fabricate a device.
To streamline the fabrication process, the researchers built a plugin for the computer-aided design software SolidWorks that automatically integrates sensors into a model of the mechanism, which could then be sent directly to the 3D printer for fabrication.
MechSense could enable engineers to rapidly prototype devices with rotating parts, like turbines or motors, while incorporating sensing directly into the designs. It could be especially useful in creating tangible user interfaces for augmented reality environments, where sensing is critical for tracking a user’s movements and interaction with objects.
“A lot of the research that we do in our lab involves taking fabrication methods that factories or specialized institutions create and then making then accessible for people. 3D printing is a tool that a lot of people can afford to have in their homes. So how can we provide the average maker with the tools necessary to develop these types of interactive mechanisms? At the end of the day, this research all revolves around that goal,” says Marwa AlAlawi, a mechanical engineering graduate student and lead author of a paper on MechSense.
AlAlawi’s co-authors include Michael Wessely, a former postdoc in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) who is now an assistant professor at Aarhus University; and senior author Stefanie Mueller, an associate professor in the MIT departments of Electrical Engineering and Computer Science and Mechanical Engineering, and a member CSAIL; as well as others at MIT and collaborators from Accenture Labs. The research will be presented at the ACM CHI Conference on Human Factors in Computing Systems.
To incorporate sensors into a rotational mechanism in a way that would not disrupt the device’s movement, the researchers leveraged capacitive sensing.
A capacitor consists of two plates of conductive material that have an insulating material sandwiched between them. If the overlapping area or distance between the conductive plates is changed, perhaps by rotating the mechanism, a capacitive sensor can detect resulting changes in the electric field between the plates. That information could then be used to calculate speed, for instance.
“In capacitive sensing, you don’t necessarily need to have contact between the two opposing conductive plates to monitor changes in that specific sensor. We took advantage of that for our sensor design,” AlAlawi says.
Rotational mechanisms typically consist of a rotational element located above, below, or next to a stationary element, like a gear spinning on a static shaft above a flat surface. The spinning gear is the rotational element and the flat surface beneath it is the stationary element.
The MechSense sensor includes three patches made from conductive material that are printed into the stationary plate, with each patch separated from its neighbors by nonconductive material. A fourth patch of conductive material, which has the same area as the other three patches, is printed into the rotating plate.
As the device spins, the patch on the rotating plate, called a floating capacitor, overlaps each of the patches on the stationary plate in turn. As the overlap between the rotating patch and each stationary patch changes (from completely covered, to half covered, to not covered at all), each patch individually detects the resulting change in capacitance.
The floating capacitor is not connected to any circuitry, so wires won’t get tangled with rotating components.
Rather, the stationary patches are wired to electronics that use software the researchers developed to convert raw sensor data into estimations of angular position, direction of rotation, and rotation speed.
Enabling rapid prototyping
To simplify the sensor integration process for a user, the researchers built a SolidWorks extension. A maker specifies the rotating and stationary parts of their mechanism, as well as the center of rotation, and then the system automatically adds sensor patches to the model.
“It doesn’t change the design at all. It just replaces part of the device with a different material, in this case conductive material,” AlAlawi says.
The researchers used their system to prototype several devices, including a smart desk lamp that changes the color and brightness of its light depending on how the user rotates the bottom or middle of the lamp. They also produced a planetary gearbox, like those that are used in robotic arms, and a wheel that measures distance as it rolls across a surface.
As they prototyped, the team also conducted technical experiments to fine-tune their sensor design. They found that, as they reduced the size of the patches, the amount of error in the sensor data increased.
“In an effort to generate electronic devices with very little e-waste, we want devices with smaller footprints that can still perform well. If we take our same approach and perhaps use a different material or manufacturing process, I think we can scale down while accumulating less error using the same geometry,” she says.
In addition to testing different materials, AlAlawi and her collaborators plan to explore how they could increase the robustness of their sensor design to external noise, and also develop printable sensors for other types of moving mechanisms.
This research was funded, in part, by Accenture Labs.
Science & Technology
Where the sidewalk ends
Most cities don’t map their own pedestrian networks. Now, researchers have built the first open-source tool to let planners do just that
Written by Peter Dizikes, MIT News Office
It’s easier than ever to view maps of any place you’d like to go — by car, that is. By foot is another matter. Most cities and towns in the U.S. do not have sidewalk maps, and pedestrians are usually left to fend for themselves: Can you walk from your hotel to the restaurants on the other side of the highway? Is there a shortcut from downtown to the sports arena? And how do you get to that bus stop, anyway?
Now MIT researchers, along with colleagues from multiple other universities, have developed an open-source tool that uses aerial imagery and image-recognition to create complete maps of sidewalks and crosswalks. The tool can help planners, policymakers, and urbanists who want to expand pedestrian infrastructure.
“In the urban planning and urban policy fields, this is a huge gap,” says Andres Sevtsuk, an associate professor at MIT and a co-author of a new paper detailing the tool’s capabilities. “Most U.S. city governments know very little about their sidewalk networks. There is no data on it. The private sector hasn’t taken on the task of mapping it. It seemed like a really important technology to develop, especially in an open-source way that can be used by other places.”
The tool, called TILE2NET, has been developed using a few U.S. areas as initial sources of data, but it can be refined and adapted for use anywhere.
“We thought we needed a method that can be scalable and used in different cities,” says Maryam Hosseini, a postdoc in MIT’s City Form Lab in the Department of Urban Studies and Planning (DUSP), whose research has focused extensively on the development of the tool.
The paper, “Mapping the Walk: A Scalable Computer Vision Approach for Generating Sidewalk Network Datasets from Aerial Imagery,” appears online in the journal Computers, Environment and Urban Systems. The authors are Hosseini; Sevtsuk, who is the Charles and Ann Spaulding Career Development Associate Professor of Urban Science and Planning in DUSP and head of MIT’s City Form Lab; Fabio Miranda, an assistant professor of computer science at the University of Illinois at Chicago; Roberto M. Cesar, a professor of computer science at the University of Sao Paulo; and Claudio T. Silva, Institute Professor of Computer Science and Engineering at New York University (NYU) Tandon School of Engineering, and professor of data science at the NYU Center for Data Science.
Significant research for the project was conducted at NYU when Hosseini was a student there, working with Silva as a co-advisor.
There are multiple ways to attempt to map sidewalks and other pedestrian pathways in cities and towns. Planners could make maps manually, which is accurate but time-consuming; or they could use roads and make assumptions about the extent of sidewalks, which would reduce accuracy; or they could try tracking pedestrians, which probably would be limited in showing the full reach of walking networks.
Instead, the research team used computerized image-recognition techniques to build a tool that will visually recognize sidewalks, crosswalks, and footpaths. To do that, the researchers first used 20,000 aerial images from Boston, Cambridge, New York City, and Washington — places where comprehensive pedestrian maps already existed. By training the image-recognition model on such clearly defined objects and using portions of those cities as a starting point, they were able to see how well TILE2NET would work elsewhere in those cities.
Ultimately the tool worked well, recognizing 90 percent or more of all sidewalks and crosswalks in Boston and Cambridge, for instance. Having been trained visually on those cities, the tool can be applied to other metro areas; people elsewhere can now plug their aerial imagery into TILE2NET as well.
“We wanted to make it easier for cities in different parts of the world to do such a thing without needing to do the heavy lifting of training [the tool],” says Hosseini. “Collaboratively we will make it better and better, hopefully, as we go along.”
The need for such a tool is vast, emphasizes Sevtsuk, whose research centers on pedestrian and nonmotorized movement in cities, and who has developed multiple kinds of pedestrian-mapping tools in his career. Most cities have wildly incomplete networks of sidewalks and paths for pedestrians, he notes. And yet it is hard to expand those networks efficiently without mapping them.
“Imagine that we had the same gaps in car networks that pedestrians have in their networks,” Sevtsuk says. “You would drive to an intersection and then the road just ends. Or you can’t take a right turn since there is no road. That’s what [pedestrians] are constantly up against, and we don’t realize how important continuity is for [pedestrian] networks.”
In the still larger picture, Sevtsuk observes, the continuation of climate change means that cities will have to expand their infrastructure for pedestrians and cyclists, among other measures; transportation remains a huge source of carbon dioxide emissions.
“When cities talk about cutting carbon emissions, there’s no other way to make a big dent than to address transportation,” Sevtsuk says. “The whole world of urban data for public transit and pedestrians and bicycles is really far behind [vehicle data] in quality. Analyzing how cities can be operational without a car requires this kind of data.”
On the bright side, Sevtsuk suggests, adding pedestrian and bike infrastructure “is being done more aggressively than in many decades in the past. In the 20th century, it was the other way around, we would take away sidewalks to make space for vehicular roads. We’re now seeing the opposite trend. To make best use of pedestrian infrastructure, it’s important that cities have the network data about it. Now you can truly tell how somebody can get to a bus stop.”
Science & Technology
Low-cost device can measure air pollution anywhere
Open-source tool from MIT’s Senseable City Lab lets people check air quality, cheaply
Written by Peter Dizikes, MIT News Office
Air pollution is a major public health problem: The World Health Organization has estimated that it leads to over 4 million premature deaths worldwide annually. Still, it is not always extensively measured. But now an MIT research team is rolling out an open-source version of a low-cost, mobile pollution detector that could enable people to track air quality more widely.
The detector, called Flatburn, can be made by 3D printing or by ordering inexpensive parts. The researchers have now tested and calibrated it in relation to existing state-of-the-art machines, and are publicly releasing all the information about it — how to build it, use it, and interpret the data.
“The goal is for community groups or individual citizens anywhere to be able to measure local air pollution, identify its sources, and, ideally, create feedback loops with officials and stakeholders to create cleaner conditions,” says Carlo Ratti, director of MIT’s Senseable City Lab.
“We’ve been doing several pilots around the world, and we have refined a set of prototypes, with hardware, software, and protocols, to make sure the data we collect are robust from an environmental science point of view,” says Simone Mora, a research scientist at Senseable City Lab and co-author of a newly published paper detailing the scanner’s testing process. The Flatburn device is part of a larger project, known as City Scanner, using mobile devices to better understand urban life.
“Hopefully with the release of the open-source Flatburn we can get grassroots groups, as well as communities in less developed countries, to follow our approach and build and share knowledge,” says An Wang, a researcher at Senseable City Lab and another of the paper’s co-authors.
The paper, “Leveraging Machine Learning Algorithms to Advance Low-Cost Air Sensor Calibration in Stationary and Mobile Settings,” appears in the journal Atmospheric Environment.
In addition to Wang, Mora, and Ratti the study’s authors are: Yuki Machida, a former research fellow at Senseable City Lab; Priyanka deSouza, an assistant professor of urban and regional planning at the University of Colorado at Denver; Tiffany Duhl, a researcher with the Massachusetts Department of Environmental Protection and a Tufts University research associate at the time of the project; Neelakshi Hudda, a research assistant professor at Tufts University; John L. Durant, a professor of civil and environmental engineering at Tufts University; and Fabio Duarte, principal research scientist at Senseable City Lab.
The Flatburn concept at Senseable City Lab dates back to about 2017, when MIT researchers began prototyping a mobile pollution detector, originally to be deployed on garbage trucks in Cambridge, Massachusetts. The detectors are battery-powered and rechargable, either from power sources or a solar panel, with data stored on a card in the device that can be accessed remotely.
The current extension of that project involved testing the devices in New York City and the Boston area, by seeing how they performed in comparison to already-working pollution detection systems. In New York, the researchers used 5 detectors to collect 1.6 million data points over four weeks in 2021, working with state officials to compare the results. In Boston, the team used mobile sensors, evaluating the Flatburn devices against a state-of-the-art system deployed by Tufts University along with a state agency.
In both cases, the detectors were set up to measure concentrations of fine particulate matter as well as nitrogen dioxide, over an area of about 10 meters. Fine particular matter refers to tiny particles often associated with burning matter, from power plants, internal combustion engines in autos and fires, and more.
The research team found that the mobile detectors estimated somewhat lower concentrations of fine particulate matter than the devices already in use, but with a strong enough correlation so that, with adjustments for weather conditions and other factors, the Flatburn devices can produce reliable results.
“After following their deployment for a few months we can confidently say our low-cost monitors should behave the same way [as standard detectors],” Wang says. “We have a big vision, but we still have to make sure the data we collect is valid and can be used for regulatory and policy purposes,”
Duarte adds: “If you follow these procedures with low-cost sensors you can still acquire good enough data to go back to [environmental] agencies with it, and say, ‘Let’s talk.’”
The researchers did find that using the units in a mobile setting — on top of automobiles — means they will currently have an operating life of six months. They also identified a series of potential issues that people will have to deal with when using the Flatburn detectors generally. These include what the research team calls “drift,” the gradual changing of the detector’s readings over time, as well as “aging,” the more fundamental deterioration in a unit’s physical condition.
Still, the researchers believe the units will function well, and they are providing complete instructions in their release of Flatburn as an open-source tool. That even includes guidance for working with officials, communities, and stakeholders to process the results and attempt to shape action.
“It’s very important to engage with communities, to allow them to reflect on sources of pollution,” says Mora.
“The original idea of the project was to democratize environmental data, and that’s still the goal,” Duarte adds. “We want people to have the skills to analyze the data and engage with communities and officials.”
Business & Economy10 months ago
NSE Academy Limited collaborates with HDFC Mutual Fund for financial awareness program
Business & Economy8 months ago
Using artificial intelligence to control digital manufacturing
Edu News10 months ago
Technique protects privacy when making online recommendations
Edu News9 months ago
Astronomers discover a multiplanet system nearby
Edu News10 months ago
Search reveals eight new sources of black hole echoes
Edu News9 months ago
Stronger security for smart devices
Edu News8 months ago
Jasudben ML School celebrated its first edition of Pride Month
Edu News8 months ago
Russian Edu Fair Held