Science & Technology
New method accelerates data retrieval in huge databases
Researchers used machine learning to build faster and more efficient hash functions, which are a key component of databases

- Share
- Tweet
- New method accelerates data retrieval in huge databases https://news.educationpostonline.com/2023/03/16/new-method-accelerates-data-retrieval-in-huge-databases/">
Written by Adam Zewe, MIT News Office
Hashing is a core operation in most online databases, like a library catalogue or an e-commerce website. A hash function generates codes that directly determine the location where data would be stored. So, using these codes, it is easier to find and retrieve the data.
However, because traditional hash functions generate codes randomly, sometimes two pieces of data can be hashed with the same value. This causes collisions — when searching for one item points a user to many pieces of data with the same hash value. It takes much longer to find the right one, resulting in slower searches and reduced performance.
Certain types of hash functions, known as perfect hash functions, are designed to place the data in a way that prevents collisions. But they are time-consuming to construct for each dataset and take more time to compute than traditional hash functions.
Since hashing is used in so many applications, from database indexing to data compression to cryptography, fast and efficient hash functions are critical. So, researchers from MIT and elsewhere set out to see if they could use machine learning to build better hash functions.
They found that, in certain situations, using learned models instead of traditional hash functions could result in half as many collisions. These learned models are created by running a machine-learning algorithm on a dataset to capture specific characteristics. The team’s experiments also showed that learned models were often more computationally efficient than perfect hash functions.
“What we found in this work is that in some situations we can come up with a better tradeoff between the computation of the hash function and the collisions we will face. In these situations, the computation time for the hash function can be increased a bit, but at the same time its collisions can be reduced very significantly,” says Ibrahim Sabek, a postdoc in the MIT Data Systems Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL).
Their research, which will be presented at the 2023 International Conference on Very Large Databases, demonstrates how a hash function can be designed to significantly speed up searches in a huge database. For instance, their technique could accelerate computational systems that scientists use to store and analyze DNA, amino acid sequences, or other biological information.
Sabek is the co-lead author of the paper with Department of Electrical Engineering and Computer Science (EECS) graduate student Kapil Vaidya. They are joined by co-authors Dominick Horn, a graduate student at the Technical University of Munich; Andreas Kipf, an MIT postdoc; Michael Mitzenmacher, professor of computer science at the Harvard John A. Paulson School of Engineering and Applied Sciences; and senior author Tim Kraska, associate professor of EECS at MIT and co-director of the Data, Systems, and AI Lab.
Hashing it out
Given a data input, or key, a traditional hash function generates a random number, or code, that corresponds to the slot where that key will be stored. To use a simple example, if there are 10 keys to be put into 10 slots, the function would generate an integer between 1 and 10 for each input. It is highly probable that two keys will end up in the same slot, causing collisions.
Perfect hash functions provide a collision-free alternative. Researchers give the function some extra knowledge, such as the number of slots the data are to be placed into. Then it can perform additional computations to figure out where to put each key to avoid collisions. However, these added computations make the function harder to create and less efficient.
“We were wondering, if we know more about the data — that it will come from a particular distribution — can we use learned models to build a hash function that can actually reduce collisions?” Vaidya says.
A data distribution shows all possible values in a dataset, and how often each value occurs. The distribution can be used to calculate the probability that a particular value is in a data sample.
The researchers took a small sample from a dataset and used machine learning to approximate the shape of the data’s distribution, or how the data are spread out. The learned model then uses the approximation to predict the location of a key in the dataset.
They found that learned models were easier to build and faster to run than perfect hash functions and that they led to fewer collisions than traditional hash functions if data are distributed in a predictable way. But if the data are not predictably distributed because gaps between data points vary too widely, using learned models might cause more collisions.
“We may have a huge number of data inputs, and the gaps between consecutive inputs are very different, so learning a model to capture the data distribution of these inputs is quite difficult,” Sabek explains.
Fewer collisions, faster results
When data were predictably distributed, learned models could reduce the ratio of colliding keys in a dataset from 30 percent to 15 percent, compared with traditional hash functions. They were also able to achieve better throughput than perfect hash functions. In the best cases, learned models reduced the runtime by nearly 30 percent.
As they explored the use of learned models for hashing, the researchers also found that throughput was impacted most by the number of sub-models. Each learned model is composed of smaller linear models that approximate the data distribution for different parts of the data. With more sub-models, the learned model produces a more accurate approximation, but it takes more time.
“At a certain threshold of sub-models, you get enough information to build the approximation that you need for the hash function. But after that, it won’t lead to more improvement in collision reduction,” Sabek says.
Building off this analysis, the researchers want to use learned models to design hash functions for other types of data. They also plan to explore learned hashing for databases in which data can be inserted or deleted. When data are updated in this way, the model needs to change accordingly, but changing the model while maintaining accuracy is a difficult problem.
“We want to encourage the community to use machine learning inside more fundamental data structures and algorithms. Any kind of core data structure presents us with an opportunity to use machine learning to capture data properties and get better performance. There is still a lot we can explore,” Sabek says.
This work was supported, in part, by Google, Intel, Microsoft, the U.S. National Science Foundation, the U.S. Air Force Research Laboratory, and the U.S. Air Force Artificial Intelligence Accelerator.
Science & Technology
3D-printed revolving devices can sense how they are moving
A new system enables makers to incorporate sensors into gears and other rotational mechanisms with just one pass in a 3D printer

**Images/video: https://news.mit.edu/2023/3d-printing-revolving-devices-sensors-0316
Written by Adam Zewe, MIT News Office
Integrating sensors into rotational mechanisms could make it possible for engineers to build smart hinges that know when a door has been opened, or gears inside a motor that tell a mechanic how fast they are rotating. MIT engineers have now developed a way to easily integrate sensors into these types of mechanisms, with 3D printing.

Credits:Credit: Courtesy of the researchers. Edited by MIT News
Even though advances in 3D printing enable rapid fabrication of rotational mechanisms, integrating sensors into the designs is still notoriously difficult. Due to the complexity of the rotating parts, sensors are typically embedded manually, after the device has already been produced.
However, manually integrating sensors is no easy task. Embed them inside a device and wires might get tangled in the rotating parts or obstruct their rotations, but mounting external sensors would increase the size of a mechanism and potentially limit its motion.
Instead, the new system the MIT researchers developed enables a maker to 3D print sensors directly into a mechanism’s moving parts using conductive 3D printing filament. This gives devices the ability to sense their angular position, rotation speed, and direction of rotation.
With their system, called MechSense, a maker can manufacture rotational mechanisms with integrated sensors in just one pass using a multi-material 3D printer. These types of printers utilize multiple materials at the same time to fabricate a device.
To streamline the fabrication process, the researchers built a plugin for the computer-aided design software SolidWorks that automatically integrates sensors into a model of the mechanism, which could then be sent directly to the 3D printer for fabrication.
MechSense could enable engineers to rapidly prototype devices with rotating parts, like turbines or motors, while incorporating sensing directly into the designs. It could be especially useful in creating tangible user interfaces for augmented reality environments, where sensing is critical for tracking a user’s movements and interaction with objects.
“A lot of the research that we do in our lab involves taking fabrication methods that factories or specialized institutions create and then making then accessible for people. 3D printing is a tool that a lot of people can afford to have in their homes. So how can we provide the average maker with the tools necessary to develop these types of interactive mechanisms? At the end of the day, this research all revolves around that goal,” says Marwa AlAlawi, a mechanical engineering graduate student and lead author of a paper on MechSense.
AlAlawi’s co-authors include Michael Wessely, a former postdoc in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) who is now an assistant professor at Aarhus University; and senior author Stefanie Mueller, an associate professor in the MIT departments of Electrical Engineering and Computer Science and Mechanical Engineering, and a member CSAIL; as well as others at MIT and collaborators from Accenture Labs. The research will be presented at the ACM CHI Conference on Human Factors in Computing Systems.
Built-in sensing
To incorporate sensors into a rotational mechanism in a way that would not disrupt the device’s movement, the researchers leveraged capacitive sensing.
A capacitor consists of two plates of conductive material that have an insulating material sandwiched between them. If the overlapping area or distance between the conductive plates is changed, perhaps by rotating the mechanism, a capacitive sensor can detect resulting changes in the electric field between the plates. That information could then be used to calculate speed, for instance.
“In capacitive sensing, you don’t necessarily need to have contact between the two opposing conductive plates to monitor changes in that specific sensor. We took advantage of that for our sensor design,” AlAlawi says.
Rotational mechanisms typically consist of a rotational element located above, below, or next to a stationary element, like a gear spinning on a static shaft above a flat surface. The spinning gear is the rotational element and the flat surface beneath it is the stationary element.
The MechSense sensor includes three patches made from conductive material that are printed into the stationary plate, with each patch separated from its neighbors by nonconductive material. A fourth patch of conductive material, which has the same area as the other three patches, is printed into the rotating plate.
As the device spins, the patch on the rotating plate, called a floating capacitor, overlaps each of the patches on the stationary plate in turn. As the overlap between the rotating patch and each stationary patch changes (from completely covered, to half covered, to not covered at all), each patch individually detects the resulting change in capacitance.
The floating capacitor is not connected to any circuitry, so wires won’t get tangled with rotating components.
Rather, the stationary patches are wired to electronics that use software the researchers developed to convert raw sensor data into estimations of angular position, direction of rotation, and rotation speed.
Enabling rapid prototyping
To simplify the sensor integration process for a user, the researchers built a SolidWorks extension. A maker specifies the rotating and stationary parts of their mechanism, as well as the center of rotation, and then the system automatically adds sensor patches to the model.
“It doesn’t change the design at all. It just replaces part of the device with a different material, in this case conductive material,” AlAlawi says.
The researchers used their system to prototype several devices, including a smart desk lamp that changes the color and brightness of its light depending on how the user rotates the bottom or middle of the lamp. They also produced a planetary gearbox, like those that are used in robotic arms, and a wheel that measures distance as it rolls across a surface.
As they prototyped, the team also conducted technical experiments to fine-tune their sensor design. They found that, as they reduced the size of the patches, the amount of error in the sensor data increased.
“In an effort to generate electronic devices with very little e-waste, we want devices with smaller footprints that can still perform well. If we take our same approach and perhaps use a different material or manufacturing process, I think we can scale down while accumulating less error using the same geometry,” she says.
In addition to testing different materials, AlAlawi and her collaborators plan to explore how they could increase the robustness of their sensor design to external noise, and also develop printable sensors for other types of moving mechanisms.
This research was funded, in part, by Accenture Labs.
Science & Technology
Where the sidewalk ends
Most cities don’t map their own pedestrian networks. Now, researchers have built the first open-source tool to let planners do just that

Written by Peter Dizikes, MIT News Office
It’s easier than ever to view maps of any place you’d like to go — by car, that is. By foot is another matter. Most cities and towns in the U.S. do not have sidewalk maps, and pedestrians are usually left to fend for themselves: Can you walk from your hotel to the restaurants on the other side of the highway? Is there a shortcut from downtown to the sports arena? And how do you get to that bus stop, anyway?
Now MIT researchers, along with colleagues from multiple other universities, have developed an open-source tool that uses aerial imagery and image-recognition to create complete maps of sidewalks and crosswalks. The tool can help planners, policymakers, and urbanists who want to expand pedestrian infrastructure.
“In the urban planning and urban policy fields, this is a huge gap,” says Andres Sevtsuk, an associate professor at MIT and a co-author of a new paper detailing the tool’s capabilities. “Most U.S. city governments know very little about their sidewalk networks. There is no data on it. The private sector hasn’t taken on the task of mapping it. It seemed like a really important technology to develop, especially in an open-source way that can be used by other places.”
The tool, called TILE2NET, has been developed using a few U.S. areas as initial sources of data, but it can be refined and adapted for use anywhere.
“We thought we needed a method that can be scalable and used in different cities,” says Maryam Hosseini, a postdoc in MIT’s City Form Lab in the Department of Urban Studies and Planning (DUSP), whose research has focused extensively on the development of the tool.
The paper, “Mapping the Walk: A Scalable Computer Vision Approach for Generating Sidewalk Network Datasets from Aerial Imagery,” appears online in the journal Computers, Environment and Urban Systems. The authors are Hosseini; Sevtsuk, who is the Charles and Ann Spaulding Career Development Associate Professor of Urban Science and Planning in DUSP and head of MIT’s City Form Lab; Fabio Miranda, an assistant professor of computer science at the University of Illinois at Chicago; Roberto M. Cesar, a professor of computer science at the University of Sao Paulo; and Claudio T. Silva, Institute Professor of Computer Science and Engineering at New York University (NYU) Tandon School of Engineering, and professor of data science at the NYU Center for Data Science.
Significant research for the project was conducted at NYU when Hosseini was a student there, working with Silva as a co-advisor.
There are multiple ways to attempt to map sidewalks and other pedestrian pathways in cities and towns. Planners could make maps manually, which is accurate but time-consuming; or they could use roads and make assumptions about the extent of sidewalks, which would reduce accuracy; or they could try tracking pedestrians, which probably would be limited in showing the full reach of walking networks.
Instead, the research team used computerized image-recognition techniques to build a tool that will visually recognize sidewalks, crosswalks, and footpaths. To do that, the researchers first used 20,000 aerial images from Boston, Cambridge, New York City, and Washington — places where comprehensive pedestrian maps already existed. By training the image-recognition model on such clearly defined objects and using portions of those cities as a starting point, they were able to see how well TILE2NET would work elsewhere in those cities.
Ultimately the tool worked well, recognizing 90 percent or more of all sidewalks and crosswalks in Boston and Cambridge, for instance. Having been trained visually on those cities, the tool can be applied to other metro areas; people elsewhere can now plug their aerial imagery into TILE2NET as well.
“We wanted to make it easier for cities in different parts of the world to do such a thing without needing to do the heavy lifting of training [the tool],” says Hosseini. “Collaboratively we will make it better and better, hopefully, as we go along.”
The need for such a tool is vast, emphasizes Sevtsuk, whose research centers on pedestrian and nonmotorized movement in cities, and who has developed multiple kinds of pedestrian-mapping tools in his career. Most cities have wildly incomplete networks of sidewalks and paths for pedestrians, he notes. And yet it is hard to expand those networks efficiently without mapping them.
“Imagine that we had the same gaps in car networks that pedestrians have in their networks,” Sevtsuk says. “You would drive to an intersection and then the road just ends. Or you can’t take a right turn since there is no road. That’s what [pedestrians] are constantly up against, and we don’t realize how important continuity is for [pedestrian] networks.”
In the still larger picture, Sevtsuk observes, the continuation of climate change means that cities will have to expand their infrastructure for pedestrians and cyclists, among other measures; transportation remains a huge source of carbon dioxide emissions.
“When cities talk about cutting carbon emissions, there’s no other way to make a big dent than to address transportation,” Sevtsuk says. “The whole world of urban data for public transit and pedestrians and bicycles is really far behind [vehicle data] in quality. Analyzing how cities can be operational without a car requires this kind of data.”
On the bright side, Sevtsuk suggests, adding pedestrian and bike infrastructure “is being done more aggressively than in many decades in the past. In the 20th century, it was the other way around, we would take away sidewalks to make space for vehicular roads. We’re now seeing the opposite trend. To make best use of pedestrian infrastructure, it’s important that cities have the network data about it. Now you can truly tell how somebody can get to a bus stop.”
Science & Technology
Low-cost device can measure air pollution anywhere
Open-source tool from MIT’s Senseable City Lab lets people check air quality, cheaply

Written by Peter Dizikes, MIT News Office
Air pollution is a major public health problem: The World Health Organization has estimated that it leads to over 4 million premature deaths worldwide annually. Still, it is not always extensively measured. But now an MIT research team is rolling out an open-source version of a low-cost, mobile pollution detector that could enable people to track air quality more widely.
The detector, called Flatburn, can be made by 3D printing or by ordering inexpensive parts. The researchers have now tested and calibrated it in relation to existing state-of-the-art machines, and are publicly releasing all the information about it — how to build it, use it, and interpret the data.
“The goal is for community groups or individual citizens anywhere to be able to measure local air pollution, identify its sources, and, ideally, create feedback loops with officials and stakeholders to create cleaner conditions,” says Carlo Ratti, director of MIT’s Senseable City Lab.
“We’ve been doing several pilots around the world, and we have refined a set of prototypes, with hardware, software, and protocols, to make sure the data we collect are robust from an environmental science point of view,” says Simone Mora, a research scientist at Senseable City Lab and co-author of a newly published paper detailing the scanner’s testing process. The Flatburn device is part of a larger project, known as City Scanner, using mobile devices to better understand urban life.
“Hopefully with the release of the open-source Flatburn we can get grassroots groups, as well as communities in less developed countries, to follow our approach and build and share knowledge,” says An Wang, a researcher at Senseable City Lab and another of the paper’s co-authors.
The paper, “Leveraging Machine Learning Algorithms to Advance Low-Cost Air Sensor Calibration in Stationary and Mobile Settings,” appears in the journal Atmospheric Environment.
In addition to Wang, Mora, and Ratti the study’s authors are: Yuki Machida, a former research fellow at Senseable City Lab; Priyanka deSouza, an assistant professor of urban and regional planning at the University of Colorado at Denver; Tiffany Duhl, a researcher with the Massachusetts Department of Environmental Protection and a Tufts University research associate at the time of the project; Neelakshi Hudda, a research assistant professor at Tufts University; John L. Durant, a professor of civil and environmental engineering at Tufts University; and Fabio Duarte, principal research scientist at Senseable City Lab.
The Flatburn concept at Senseable City Lab dates back to about 2017, when MIT researchers began prototyping a mobile pollution detector, originally to be deployed on garbage trucks in Cambridge, Massachusetts. The detectors are battery-powered and rechargable, either from power sources or a solar panel, with data stored on a card in the device that can be accessed remotely.
The current extension of that project involved testing the devices in New York City and the Boston area, by seeing how they performed in comparison to already-working pollution detection systems. In New York, the researchers used 5 detectors to collect 1.6 million data points over four weeks in 2021, working with state officials to compare the results. In Boston, the team used mobile sensors, evaluating the Flatburn devices against a state-of-the-art system deployed by Tufts University along with a state agency.
In both cases, the detectors were set up to measure concentrations of fine particulate matter as well as nitrogen dioxide, over an area of about 10 meters. Fine particular matter refers to tiny particles often associated with burning matter, from power plants, internal combustion engines in autos and fires, and more.
The research team found that the mobile detectors estimated somewhat lower concentrations of fine particulate matter than the devices already in use, but with a strong enough correlation so that, with adjustments for weather conditions and other factors, the Flatburn devices can produce reliable results.
“After following their deployment for a few months we can confidently say our low-cost monitors should behave the same way [as standard detectors],” Wang says. “We have a big vision, but we still have to make sure the data we collect is valid and can be used for regulatory and policy purposes,”
Duarte adds: “If you follow these procedures with low-cost sensors you can still acquire good enough data to go back to [environmental] agencies with it, and say, ‘Let’s talk.’”
The researchers did find that using the units in a mobile setting — on top of automobiles — means they will currently have an operating life of six months. They also identified a series of potential issues that people will have to deal with when using the Flatburn detectors generally. These include what the research team calls “drift,” the gradual changing of the detector’s readings over time, as well as “aging,” the more fundamental deterioration in a unit’s physical condition.
Still, the researchers believe the units will function well, and they are providing complete instructions in their release of Flatburn as an open-source tool. That even includes guidance for working with officials, communities, and stakeholders to process the results and attempt to shape action.
“It’s very important to engage with communities, to allow them to reflect on sources of pollution,” says Mora.
“The original idea of the project was to democratize environmental data, and that’s still the goal,” Duarte adds. “We want people to have the skills to analyze the data and engage with communities and officials.”
-
Business & Economy10 months ago
NSE Academy Limited collaborates with HDFC Mutual Fund for financial awareness program
-
Business & Economy8 months ago
Using artificial intelligence to control digital manufacturing
-
Edu News10 months ago
Technique protects privacy when making online recommendations
-
Edu News9 months ago
Astronomers discover a multiplanet system nearby
-
Edu News10 months ago
Search reveals eight new sources of black hole echoes
-
Edu News9 months ago
Stronger security for smart devices
-
Edu News8 months ago
Jasudben ML School celebrated its first edition of Pride Month
-
Edu News8 months ago
Russian Edu Fair Held