Urban Rhapsody: Large-scale exploration of urban soundscapes
Abstract
Noise is one of the primary quality-of-life issues in urban environments. In addition to annoyance, noise negatively impacts public health and educational performance. While low-cost sensors can be deployed to monitor ambient noise levels at high temporal resolutions, the amount of data they produce and the complexity of these data pose significant analytical challenges. One way to address these challenges is through machine listening techniques, which are used to extract features in attempts to classify the source of noise and understand temporal patterns of a city's noise situation. However, the overwhelming number of noise sources in the urban environment and the scarcity of labeled data makes it nearly impossible to create classification models with large enough vocabularies that capture the true dynamism of urban soundscapes. In this paper, we first identify a set of requirements in the yet unexplored domain of urban soundscape exploration. To satisfy the requirements and tackle the identified challenges, we propose Urban Rhapsody, a framework that combines state-of-the-art audio representation, machine learning and visual analytics to allow users to interactively create classification models, understand noise patterns of a city, and quickly retrieve and label audio excerpts in order to create a large high-precision annotated database of urban sound recordings. We demonstrate the tool's utility through case studies performed by domain experts using data generated over the five-year deployment of a one-of-a-kind sensor network in New York City.
Supporting Information
Filename | Description |
---|---|
cgf14534-sup-0001-S1.mp477.4 MB | Supporting Information |
cgf14534-sup-0002-S1.mp4158 MB | Supporting Information |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- Andrienko G., Andrienko N.: Spatio-temporal aggregation for visual analysis of movements. In 2008 IEEE Symposium on Visual Analytics Science and Technology (2008), IEEE, pp. 51–58. 2
- Aytar Y., Vondrick C., Torralba A.: Soundnet: Learning sound representations from unlabeled video. arXiv preprint ID:1610.09001 (2016). 3
- Arandjelovic R., Zisserman A.: Look, listen and learn. In Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 609–617. 3
- Bui Q., Badger E.: The Coronavirus Quieted City Noise. Listen to What's Left. The New York Times (May 2020). URL: https://www.nytimes.com/interactive/2020/05/22/upshot/coronavirus-quiet-city-noise.html. 3
- Battle L., Eichmann P., Angelini M., Catarci T., Santucci G., Zheng Y., Binnig C., Fekete J.-D., Moritz D.: Database benchmarking for supporting real-time interactive querying of large data. In Proceedings of the 2020 International Conference on Management of Data (2020), SIGMOD '20, ACM, pp. 1571–1587. 10
- Bronzaft A. L., Hagler L.: Noise: The invisible pollutant that cannot be ignored. In Emerging Environmental Technologies, Volume II. Springer, 2010, pp. 75–96. 2
- Bernard J., Hutter M., Zeppelzauer M., Fellner D., Sedlmair M.: Comparing visual-interactive labeling with active learning: An experimental study. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017), 298–308. 3
- Bronzaft A.: Neighborhood noise and its consequences. Survey Research Unit, School of Public Affairs, Baruch College, New York (2007). 2
- Brown A. L.: Soundscapes and environmental noise management. Noise Control Engineering Journal 58, 5 (2010), 493–500. 2
- Brown A. L.: A review of progress in soundscapes and an approach to soundscape planning. International Journal of Acoustics and Vibration 17, 2 (2012), 73–81. 2
- Bello J. P., Silva C., Nov O., Dubois R. L., Arora A., Salamon J., Mydlarz C., Doraiswamy H.: Sonyc: A system for monitoring, analyzing, and mitigating urban noise pollution. Communications of the ACM 62, 2 (2019), 68–77. 2, 3
- Cartwright M., Cramer J., Salamon J., Bello J. P.: TriCycle: Audio representation learning from sensor network data using self-supervision. In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2019), IEEE, pp. 278–282. 3
- Chirigati F., Doraiswamy H., Damoulas T., Freire J.: Data polygamy: the many-many relationships among urban spatiotemporal data sets. In Procedings of the 2016 International Conference on Management of Data (2016), pp. 1011–1025. 2
- Cartwright M., Seals A., Salamon J., Williams A., Mikloska S., MacConnell D., Law E., Bello J. P., Nov O.: Seeing sound: Investigating the effects of visualizations and complexity on crowdsourced audio annotations. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 1–21. 7, 10
10.1145/3134664 Google Scholar
- Cramer J., Wu H.-H., Salamon J., Bello J. P.: Look, listen, and learn more: Design choices for deep audio embeddings. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019), IEEE, pp. 3852–3856. 3, 4
- Davies W. J., Adams M. D., Bruce N. S., Cain R., Carlyle A., Cusack P., Hall D. A., Hume K. I., Irwin A., Jennings P.: Perception of soundscapes: An interdisciplinary approach. Applied acoustics 74, 2 (2013), 224–231. 2
- Dema T., Brereton M., Cappadonna J. L., Roe P., Truskinger A., Zhang J.: Collaborative exploration and sensemaking of big environmental sound data. Computer Supported Cooperative Work 26, 4–6 (2017), 693–731. 2
- Doraiswamy H., Freire J., Lage M., Miranda F., Silva C.: Spatio-temporal urban data analysis: A visual analytics perspective. IEEE Computer Graphics and Applications 38, 5 (2018), 26–35. 2
- de Paiva Vianna K. M., Cardoso M. R. A., RoDdrigues R. M. C.: Noise pollution and annoyance: An urban soundscapes study. Noise & Health 17, 76 (2015), 125. 2
- Day L., Riepe D.: Field Guide to the Neighborhood Birds of New York City. JHU Press, 2015. 9
- Doraiswamy H., Tzirita Zacharatou E., Miranda F., Lage M., Ailamaki A., Silva C. T., Freire J.: Interactive visual exploration of spatio-temporal urban data sets using urbane. In Proceedings of the 2018 International Conference on Management of Data (2018), SIGMOD '18, ACM, pp. 1693–1696. 2, 3, 7
- Deng Z., Weng D., Xie X., Bao J., Zheng Y., Xu M., Chen W., Wu Y.: Compass: Towards better causal analysis of urban time series. IEEE Transactions on Visualization and Computer Graphics 28, 1 (2021), 1051–1061. 2
- Dratva J., Zemp E., Dietrich D. F., Bridevaux P.-O., Rochat T., Schindler C., Gerbase M. W.: Impact of road traffic noise annoyance on health-related quality of life: Results from a population-based study. Quality of Life Research 19, 1 (2010), 37–46. 2
- Faiss. URL: https://faiss.ai/. 10
- Fonseca E., Favory X., Pons J., Font F., Serra X.: FSD50k: an open dataset of human-labeled sound events. arXiv preprint ID:2010.00475 (2020). 3
- Farnsworth A., Kelling S., Lostanlen V., Salamon J., Cramer A., Bello J. P.: BirdVox-296h: a large-scale dataset for detection and classification of flight calls, Dec. 2021. 10
- Ferreira N., Lage M., Doraiswamy H., Vo H., Wilson L., Werner H., Park M., Silva C.: Urbane: A 3D framework to support data driven decision making in urban development. In 2015 IEEE Conference on Visual Analytics Science and Technology (VAST) (2015), IEEE, pp. 97–104. 2
- Ferreira N., Poco J., Vo H. T., Freire J., Silva C. T.: Visual exploration of big spatio-temporal urban data: A study of new york city taxi trips. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2149–2158. 2
- Guite H. F., Clark C., Ackrill G.: The impact of the physical and urban environment on mental well-being. Public Health 120, 12 (2006), 1117–1126. 2
- Grollmisch S., Cano E., Kehling C., Taenzer M.: Analyzing the Potential of Pre-Trained Embeddings for Audio Classification Tasks. In 2020 28th European Signal Processing Conference (EUSIPCO) (2021), IEEE, pp. 790–794. 3
- Gemmeke J. F., Ellis D. P., Freedman D., Jansen A., Lawrence W., Moore R. C., Plakal M., Ritter M.: Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017), IEEE, pp. 776–780. 3
- Guastavino C.: Etude sémantique et acoustique de la perception des basses fréquences dans l'environnement sonore urbain. PhD Thesis, Paris 6, 2003. 2
- Hershey S., Chaudhuri S., Ellis D. P., Gemmeke J. F., Jansen A., Moore R. C., Plakal M., Platt D., Saurous R. A., Seybold B., others: CNN architectures for large-scale audio classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017), IEEE, pp. 131–135. 3
- Haralabidis A. S., Dimakopoulou K., Vigna-Taglianti F., Giampaolo M., Borgini A., Dudley M.-L., Pershagen G., Bluhm G., Houthuijs D., Babisch W.: Acute effects of night-time noise exposure on blood pressure in populations living near airports. European Heart Journal 29, 5 (2008), 658–664. 2
- Hammer M. S., Swinburn T. K., Neitzel R. L.: Environmental noise pollution in the United States: developing an effective public health response. Environmental Health Perspectives 122, 2 (2014), 115–119. 2
- Itoh M., Yokoyama D., Toyoda M., Tomita Y., Kawamura S., Kitsuregawa M.: Visual fusion of mega-city big data: an application to traffic and tweets data analysis of metro passengers. In 2014 IEEE International Conference on Big Data (Big Data) (2014), IEEE, pp. 431–440. 2
- Joia P., Coimbra D., Cuminato J. A., Paulovich F. V., Nonato L. G.: Local affine multidimensional projection. IEEE Transactions on Visualization and Computer Graphics 17, 12 (2011), 2563–2571. 3
- Jansen A., Plakal M., Pandya R., Ellis D. P., Hershey S., Liu J., Moore R. C., Saurous R. A.: Unsupervised learning of semantic audio representations. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), IEEE, pp. 126–130. 3
- Johnston K., Ver Hoef J. M., Krivoruchko K., Lucas N.: Using ArcGIS geostatistical analyst, vol. 380. Esri Redlands, 2001. 2
- Kumar A., Khadkevich M., Fügen C.: Knowledge transfer from weakly labeled audio using convolutional neural network for sound events and scenes. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), IEEE, pp. 326–330. 3
- Lin H., Gao S., Gotz D., Du F., He J., Cao N.: Rclens: Interactive rare category exploration and identification. IEEE Transactions on Visualization and Computer Graphics 24, 7 (2017), 2223–2237. 3
- Lenormand M., Gonçalves B., Tugores A., Ramasco J. J.: Human diffusion and city influence. Journal of The Royal Society Interface 12, 109 (2015), 20150473. 2
- Liu Z., Heer J.: The effects of interactive latency on exploratory visual analysis. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2122–2131. 4
- Liu Y., Jun E., Li Q., Heer J.: Latent space cartography: Visual analysis of vector space embeddings. Computer Graphics Forum 38, 3 (2019), 67–78. 3
- Liao Z., Yu Y., Chen B.: Anomaly detection in gps data based on visual analytics. In 2010 IEEE Symposium on Visual Analytics Science and Technology (2010), IEEE, pp. 51–58. 3
- Mays J. C.: Why Construction Noise Is Keeping You Up at 3 A.M. The New York Times (Sept. 2019). URL: https://www.nytimes.com/2019/09/27/nyregion/noise-construction-sleep-nyc.html. 8
- McIlraith A., Card H.: Bird song identification using artificial neural networks and statistical analysis. In CCECE'97. Canadian Conference on Electrical and Computer Engineering. Engineering Innovation: Voyage of Discovery. Conference Proceedings (1997), vol. 1, IEEE, pp. 63–66. 9
- Mendes S., Colino-Rabanal V. J., Peris S. J.: Bird song variations along an urban gradient: The case of the european blackbird (turdus merula). Landscape and Urban Planning 99, 1 (2011), 51–57. 9
- Miranda F., Doraiswamy H., Lage M., Zhao K., Gonçalves B., Wilson L., Hsieh M., Silva C. T.: Urban Pulse: Capturing the rhythm of cities. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2017), 791–800. 2
- Miranda F., Doraiswamy H., Lage M., Wilson L., Hsieh M., Silva C. T.: Shadow Accrual Maps: Efficient accumulation of city-scale shadows over time. IEEE Transactions on Visualization and Computer Graphics 25, 3 (2019), 1559–1574. 2
- Miranda F., Hosseini M., Lage M., Doraiswamy H., Dove G., Silva C. T.: Urban Mosaic: Visual exploration of streetscapes using large-scale image data. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (2020), CHI '20, ACM, p. 1–15. 3
- McInnes L., Healy J., Melville J.: Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint ID:1802.03426 (2018). 3, 6
- Miranda F., Lage M., Doraiswamy H., Mydlarz C., Salamon J., Lockerman Y., Freire J., Silva C. T.: Time Lattice: A data structure for the interactive visual analysis of large time series. Computer Graphics Forum 37, 3 (2018), 23–35. 2, 3, 9
- Malik A., Maciejewski R., Elmqvist N., Jang Y., Ebert D. S., Huang W.: A correlative analysis process in a visual analytics environment. In 2012 IEEE Conference on Visual Analytics Science and Technology (VAST) (2012), IEEE, pp. 33–42. 2
- Miller B. S., Milnes M., Whiteside S.: Long-term underwater acoustic recordings 2013-2019. URL: https://researchdata.edu.au/long-term-underwater-2013-2019/967510. 10
- Muzet A.: The need for a specific noise measurement for population exposed to aircraft noise during night-time. Noise and Health 4, 15 (2002), 61. 2
- Nemeth E., Brumm H.: Birds and anthropogenic noise: are urban songs adaptive? The American Naturalist 176, 4 (2010), 465–475. 10
- Neitzel R. L., Gershon R. R., McAlexander T. P., Magda L. A., Pearson J. M.: Exposures to transit and other sources of noise among New York City residents. Environmental science & technology 46, 1 (2012), 500–508. 2
- Nadj M., Knaeble M., Li M. X., Maedche A.: Power to the oracle? Design principles for interactive labeling systems in machine learning. KI-Künstliche Intelligenz 34, 2 (2020), 131–142. 3
- Nemeth E., Pieretti N., Zollinger S. A., Geberzahn N., Partecke J., Miranda A. C., Brumm H.: Bird song and anthropogenic noise: vocal constraints may explain why birds sing higherfrequency songs in cities. Proceedings of the Royal Society B: Biological Sciences 280, 1754 (2013), 20122798. 10
- Noulas A., Scellato S., Lambiotte R., Pontil M., Mascolo C.: A tale of many cities: universal patterns in human urban mobility. PloS one 7, 5 (2012), e37027. 2
- Organization W. H.: Burden of disease from environmental noise: Quantification of healthy life years lost in Europe. World Health Organization. Regional Office for Europe, 2011. 2
- Ortner T., Sorger J., Steinlechner H., Hesina G., Piringer H., Gröller E.: Vis-a-ware: Integrating spatial and non-spatial visualization for visibility-aware urban planning. IEEE Transactions on Visualization and Computer Graphics 23, 2 (2016), 1139–1151. 3
- Payne S. R., Davies W. J., Adams M. D.: Research into the practical and policy applications of soundscape concepts and techniques in urban areas. Tech. rep., University of Salford, 2009. 2
- Quercia D., Saez D.: Mining urban deprivation from foursquare: Implicit crowdsourcing of city land use. IEEE Pervasive Computing 13, 2 (2014), 30–36. 2
- RAPIDS. URL: https://rapids.ai/start.html. 8, 10
- RAPIDS Benchmark. URL: https://www.alcf.anl.gov/sites/default/files/2021-03/NVIDIA_RAPIDS_ANL.pdf. 10
- Raimbault M., Dubois D.: Urban soundscapes: Experiences and knowledge. Cities 22, 5 (2005), 339–350. 2
- Raimbault M., Lavandier C., Bérengier M.: Ambient sound assessment of urban environments: field studies in two French cities. Applied Acoustics 64, 12 (2003), 1241–1256. 2
- Szubert B., Cole J. E., Monaco C., Drozdov I.: Structure-preserving visualisation of high dimensional single-cell datasets. Scientific reports 9, 1 (2019), 1–10. 6
- Seress G., Liker A.: Habitat urbanization and its effects on birds. Acta Zoologica Academiae Scientiarum Hungaricae 61, 4 (2015), 373–408. 9
- Slabbekoorn H.: Songs of the city: noise-dependent spectral plasticity in the acoustic phenotype of urban birds. Animal Behaviour 85, 5 (2013), 1089–1099. 9
- Smilkov D., Thorat N., Nicholson C., Reif E., Viégas F. B., Wattenberg M.: Embedding projector: Interactive visualization and interpretation of embeddings. arXiv preprint arXiv:1611.05469 (2016). 3
- Technology for a quieter America, National Academy of Engineering. Tech. rep., Technical report, NAEPR-06-01-A, 2007. 2
- Tagliasacchi M., Gfeller B., de Chaumont Quitry F., Roblek D.: Pre-training audio representations with self-supervision. IEEE Signal Processing Letters 27 (2020), 600–604. 3
- Van Kempen E., Devilee J., Swart W., Van Kamp I.: Characterizing urban areas with good sound quality: Development of a research protocol. Noise and Health 16, 73 (2014), 380. 2
- Washington Square Park Eco Projects: Explore birds, 2021. URL: https://www.wspecoprojects.org/our-projects/explore-birds/. 9
- Wang Y., Bryan N. J., Salamon J., Cartwright M., Bello J. P.: Who calls the shots? Rethinking few-shot learning for audio. In 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2021), IEEE, pp. 36–40. 3
- Wilkinghoff K.: On open-set classification with L3-Net embeddings for machine listening applications. In 2020 28th European Signal Processing Conference (EUSIPCO) (2021), IEEE, pp. 800–804. 3
- Wang Z., Lu M., Yuan X., Zhang J., Van De Wetering H.: Visual traffic jam analysis based on trajectory data. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2159–2168. 2
- Wang Y., Mendez A. E. M., Cartwright M., Bello J. P.: Active learning for efficient audio annotation and classification with a large amount of unlabeled data. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019), IEEE, pp. 880–884. 5
- Wyse L.: Audio spectrogram representations for processing with convolutional neural networks. arXiv preprint ID:1706.09559 (2017). 4
- Xie J., Zhu M.: Handcrafted features and late fusion with deep learning for bird sound classification. Ecological Informatics 52 (2019), 74–81. 9
- Yosinski J., Clune J., Bengio Y., Lipson H.: How Transferable Are Features in Deep Neural Networks? In Procedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (2014), NIPS'14, MIT Press, pp. 3320–3328. 3
- Yu L., Wu W., Li X., Li G., Ng W. S., Ng S.-K., Huang Z., Arunan A., Watt H. M.: iviztrans: Interactive visual learning for home and work place detection from massive public transportation data. In 2015 IEEE Conference on Visual Analytics Science and Technology (VAST) (2015), IEEE, pp. 49–56. 3
- Zeng W., Fu C.-W., Arisona S. M., Erath A., Qu H.: Visualizing mobility of public transportation system. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 1833–1842. 2
- Zheng Y., Liu F., Hsieh H.-P.: U-air: When urban air quality inference meets big data. In Procedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (2013), pp. 1436–1444. 2
- Zheng Y., Liu T., Wang Y., Zhu Y., Liu Y., Chang E.: Diagnosing new york city's noises with ubiquitous data. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing (2014), pp. 715–725. 2
- Zheng Y., Wu W., Chen Y., Qu H., Ni L. M.: Visual analytics in urban computing: An overview. IEEE Transactions on Big Data 2, 3 (2016), 276–296. 2
10.1109/TBDATA.2016.2586447 Google Scholar
- Zahálka J., Worring M., Van Wijk J. J.: Ii-20: Intelligent and pragmatic analytic categorization of image collections. IEEE Transactions on Visualization and Computer Graphics (2020). 3
- Zhang J., Yanli E., Ma J., Zhao Y., Xu B., Sun L., Chen J., Yuan X.: Visual analysis of public utility service problems in a metropolis. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 1843–1852. 2