GeoBizML: Geospatial Data Visualization, Analysis, and Machine Learning¶

Overview¶

GeoBizML is a powerful Python package designed for geospatial data visualization, analysis, and machine learning. It provides a comprehensive set of tools to help data scientists, analysts, and researchers extract valuable insights from spatial data. Whether you're working on urban planning, environmental studies, or any other field involving geospatial data, GeoBizML streamlines the process of handling, visualizing, and analyzing such data.

Alt text

Key Features¶

Interactive Mapping: Easily create interactive maps using HTML and OpenStreetMap. Visualize geospatial data dynamically within Jupyter Notebooks or web applications.
Geospatial Data Manipulation: Utilize GeoPandas to handle geospatial data efficiently. Perform spatial operations, merge datasets, and manage geometries with ease.
Advanced Analytics: Implement clustering algorithms such as KMeans to uncover patterns in your spatial data. Visualize clusters on maps and gain deeper insights into your data distribution.
Visualization Tools: Leverage Matplotlib and Cartopy to create detailed, publication-quality maps and plots. Customize visualizations to meet your specific needs.
Voronoi Diagrams and Convex Hulls: Generate Voronoi diagrams and convex hulls to analyze spatial relationships and boundaries.
Interpolation and Grid Data: Perform spatial interpolation using Scipy's griddata function. Create smooth surfaces and contour maps from scattered data points.

Tools and Libraries¶

GeoBizML integrates a variety of powerful libraries to provide its extensive functionality:

IPython.display: For displaying HTML, JavaScript, and other interactive elements within Jupyter Notebooks.
GeoPandas: For geospatial data manipulation, providing the ability to read, write, and analyze geospatial data.
Shapely: For geometric operations and spatial analysis, enabling the creation and manipulation of complex geometric shapes.
Scikit-learn: For machine learning algorithms, such as KMeans, facilitating clustering and other machine learning tasks.
Matplotlib: For creating static, animated, and interactive visualizations in Python.
NumPy: For numerical operations and handling large datasets efficiently.
Scipy: For advanced mathematical and scientific computations, including spatial interpolation.
Cartopy: For creating maps and handling projections, making it easier to visualize geospatial data on different map projections.
Requests: For making HTTP requests, useful for fetching data from online sources.

Installation¶

To install GeoBizML, use pip:

pip install geobizml

Data Visualization with geobizml¶

In [1]:

from geobizml import visualization as viz
import random

In [2]:

# Define the bounding box for a fictional town (latitude, longitude)
town_area = {
    'min_lat': 45.5,
    'max_lat': 45.6,
    'min_lon': -122.7,
    'max_lon': -122.6
}

# Generate random points with values within the town area
def generate_random_points_with_values(num_points, area):
    points = []
    for _ in range(num_points):
        lat = random.uniform(area['min_lat'], area['max_lat'])
        lon = random.uniform(area['min_lon'], area['max_lon'])
        value = random.uniform(0, 100)  # Random value for demonstration
        points.append({'lon': lon, 'lat': lat, 'value': value})
    return points

# Generate 100 random points with values within the town area
num_points = 100
points_with_values = generate_random_points_with_values(num_points, town_area)
values = [value for lon, lat, value in points_with_values]

In [5]:

import pandas as pd

In [7]:

df = pd.DataFrame(points_with_values)

DataFrame must contains columns 'lon', 'lat', 'value' (optional)¶

1. `lat` (Latitude)¶

Description: The lat column should contain the latitude coordinates of the geospatial data points.
Type: Floating-point numbers.
Range: Values must be between -90.0 and 90.0.
Coordinate System: WGS84 (World Geodetic System 1984).
Example:

df['lat'] = [34.0522, 36.1699, 40.7128]

2. lon (Longitude)¶

-Description: The lon column should contain the longitude coordinates of the geospatial data points.
-Type: Floating-point numbers.
-Range: Values must be between -180.0 and 180.0.
-Coordinate System: WGS84 (World Geodetic System 1984).\

-Example:

df['lon'] = [-118.2437, -115.1398, -74.0060]

In [10]:

df.head()

Out[10]:

	lon	lat	value
0	-122.663368	45.597457	37.227892
1	-122.603317	45.519464	86.671381
2	-122.626266	45.581551	85.639504
3	-122.604935	45.569549	71.319544
4	-122.654830	45.539957	73.268159

convert to dictionary¶

In [11]:

points_with_values = df.to_dict('records')

Function: plot_points_with_dot_markers(points_with_values, dot_size=5, dot_color='blue', hue_column=None)¶

Overview¶

The plot_points_with_dot_markers function in GeoBizML is designed to visualize geospatial data points on a map using dot markers, allowing for easy customization of dot size, color, and optionally using a hue column for categorical coloring. This function is particularly useful for creating clear and informative point plots on maps, enhancing the visual representation of geospatial data.

Parameters

points_with_values: A GeoDataFrame or DataFrame containing geospatial data points to be plotted.

dot_size: Optional. Integer specifying the size of the dot markers. Default is 5.

dot_color: Optional. String specifying the color of the dot markers. Default is 'blue'.

hue_column: Optional. String specifying the column name in points_with_values DataFrame to use for categorical coloring of dot markers. If provided, dots will be colored according to unique values in this column.

Creating a Point Plot¶

In [13]:

viz.plot_points_with_dot_markers(points_with_values, dot_size=5, dot_color='blue', hue_column=None)

Out[13]:

Creating a Point Plot with Hue¶

To create a point plot on a map with GeoBizML, we can use the plot_points_with_dot_markers function. This function allows you to specify a hue value, which will color the points based on the value column. Make Sure that you dataset contains the value column.

In [12]:

viz.plot_points_with_dot_markers(points_with_values, dot_size=5, dot_color=None, hue_column='value')

Out[12]:

Function: plot_points_bubble(points_with_values, dot_size=5, dot_color='red', hue_column='value', weight=0.2)¶

Overview¶

The plot_points_bubble function in GeoBizML enables the creation of a bubble plot on a map, where the size of each point (bubble) is adjusted based on a weight column in the dataset. This function is useful for visualizing geospatial data points with varying magnitudes or quantities represented by the size of the bubbles, enhancing the representation of data distribution across geographical locations.

Parameters¶

points_with_values: A GeoDataFrame or DataFrame containing geospatial data points to be plotted.

dot_size: Optional. Integer specifying the base size of the dot markers (bubbles). Default is 5.

dot_color: Optional. String specifying the color of the dot markers (bubbles). Default is 'red'.

hue_column: Optional. String specifying the column name in points_with_values DataFrame to use for hue-based coloring of dot markers. Default is 'value'.

weight: Optional. Float specifying the scaling factor for adjusting the size of the dot markers (bubbles) based on the values in hue_column. Default is 0.2.

In [16]:

viz.plot_points_bubble(points_with_values, dot_size=5, dot_color='red', hue_column='value', weight = 0.2)

Out[16]:

Unlocking Business Potential with Cluster Analysis¶

In today's data-driven world, businesses are inundated with vast amounts of data from various sources. From customer demographics to purchasing behavior, this data holds the potential to reveal invaluable insights. However, making sense of this data can be challenging. This is where cluster analysis comes into play. By grouping similar data points together, cluster analysis helps businesses identify patterns and make informed decisions. In this blog, we'll explore what cluster analysis is, its key techniques, mathematical formulations, and its importance in various industries.

What is Cluster Analysis?¶

Cluster analysis is a statistical technique used to group similar objects or data points into clusters based on specific criteria. The goal is to ensure that data points within a cluster are more similar to each other than to those in other clusters. This technique helps uncover hidden structures and patterns within large datasets, making it easier to understand and analyze complex information.

Key Terminologies¶

Cluster: A group of similar data points.
Centroid: The center point of a cluster, representing the mean position of all points in the cluster.
Dendrogram: A tree-like diagram used to illustrate the arrangement of clusters produced by hierarchical clustering.
Noise: Data points that do not belong to any cluster (often identified in density-based clustering).
Silhouette Score: A measure of how similar an object is to its own cluster compared to other clusters.

Key Techniques in Cluster Analysis¶

Cluster analysis encompasses various techniques, each suited to different types of data and business needs. Here are some of the most commonly used methods:

1. K-means Clustering¶

Algorithm Overview: K-means clustering partitions data into K clusters, with each cluster having a centroid. Data points are assigned to the nearest centroid, and centroids are recalculated iteratively until convergence.

Alt text

Mathematical Formulation:

Initialize K centroids randomly.
Assign each data point $( x_i )$ to the nearest centroid $( \mu_k )$ using the Euclidean distance: $$ d(x_i, \mu_k) = \sqrt{\sum_{j=1}^n (x_{ij} - \mu_{kj})^2} $$
Update centroids by calculating the mean of all points in the cluster: $$ \mu_k = \frac{1}{|C_k|} \sum_{x_i \in C_k} x_i $$
Repeat steps 2 and 3 until the centroids do not change significantly.

Key Terminologies: Centroid, Euclidean Distance, Iteration

Pros and Cons:

Pros: Simple, scalable, and efficient for large datasets.
Cons: Requires pre-specification of the number of clusters, sensitive to initial centroid selection.

Application Example: Market segmentation, image compression, anomaly detection.

2. Hierarchical Clustering¶

Algorithm Overview: Builds a hierarchy of clusters using either an agglomerative (bottom-up) or divisive (top-down) approach. It does not require the number of clusters to be specified in advance.

Alt text

Mathematical Formulation:

Agglomerative Approach:
- Start with each data point as a single cluster.
- Merge the closest pair of clusters iteratively using a linkage criterion (e.g., single, complete, average, Ward's method) until all points are in one cluster.
Linkage Methods:
- Single Linkage (Minimum distance): $$ d(C_i, C_j) = \min_{x \in C_i, y \in C_j} d(x, y) $$
- Complete Linkage (Maximum distance): $$ d(C_i, C_j) = \max_{x \in C_i, y \in C_j} d(x, y) $$
- Average Linkage: $$ d(C_i, C_j) = \frac{1}{|C_i| |C_j|} \sum_{x \in C_i} \sum_{y \in C_j} d(x, y) $$
- Ward's Method: $$ d(C_i, C_j) = \sqrt{\frac{2 |C_i||C_j|}{|C_i|+|C_j|}} \| \mu_i - \mu_j \| $$

Key Terminologies: Dendrogram, Linkage Criteria

Pros and Cons:

Pros: Produces a dendrogram for easy visualization of cluster relationships, does not require the number of clusters to be predetermined.
Cons: Computationally intensive, less efficient for large datasets.

Application Example: Document clustering, social network analysis, image segmentation.

3. Density-Based Clustering (DBSCAN)¶

Algorithm Overview: Identifies clusters based on dense regions of data points separated by sparser areas. Points in high-density regions are clustered together, while points in low-density regions are considered noise.

Alt text

Mathematical Formulation:

Define two parameters: $( \epsilon )$ (maximum radius of the neighborhood) and $( MinPts )$ (minimum number of points in a neighborhood to form a dense region).
Classify points as:
- Core Points: If at least $( MinPts )$ points are within distance $( \epsilon )$.
- Border Points: If fewer than $( MinPts )$ points are within $( \epsilon )$, but it is in the neighborhood of a core point.
- Noise Points: If it is neither a core point nor a border point.
Expand clusters from core points by recursively including points that are within $( \epsilon )$ distance of any core point.

Key Terminologies: Core Points, Border Points, Noise, Epsilon ($( \epsilon )$)

Pros and Cons:

Pros: Can find arbitrarily shaped clusters, handles noise effectively.
Cons: Requires selection of parameters (epsilon and minimum points), less effective for varying densities.

Application Example: Geospatial analysis, fraud detection, biological data analysis.

4. Gaussian Mixture Models (GMM)¶

Algorithm Overview: Assumes data is generated from a mixture of several Gaussian distributions. It uses the Expectation-Maximization (EM) algorithm to estimate the parameters of these distributions.

Alt text

Mathematical Formulation:

Expectation Step (E-Step): Calculate the probability that each data point belongs to each cluster (responsibilities) based on current parameter estimates: $$ \gamma_{ik} = \frac{\pi_k \mathcal{N}(x_i | \mu_k, \Sigma_k)}{\sum_{j=1}^{K} \pi_j \mathcal{N}(x_i | \mu_j, \Sigma_j)} $$ where $( \pi_k )$ is the mixing coefficient, and $( \mathcal{N} )$ is the Gaussian distribution.
Maximization Step (M-Step): Update parameters based on the responsibilities calculated in the E-Step: $$ \mu_k = \frac{1}{N_k} \sum_{i=1}^{N} \gamma_{ik} x_i $$ $$ \Sigma_k = \frac{1}{N_k} \sum_{i=1}^{N} \gamma_{ik} (x_i - \mu_k)(x_i - \mu_k)^T $$ $$ \pi_k = \frac{N_k}{N} $$ where $( N_k = \sum_{i=1}^{N} \gamma_{ik} )$.

Key Terminologies: Gaussian Distribution, Expectation-Maximization, Mixing Coefficient

Pros and Cons:

Pros: Provides probabilistic cluster assignments, flexible in terms of cluster shapes.
Cons: Computationally intensive, sensitive to initialization.

Application Example: Market segmentation, customer behavior analysis, anomaly detection.

Evaluating Cluster Quality¶

Evaluating the quality of clusters is essential to ensure meaningful and actionable insights. Several metrics can be used to assess cluster validity:

Silhouette Score¶

Measures how similar a data point is to its own cluster compared to other clusters. Values range from -1 to 1, with higher values indicating better-defined clusters. $$ s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))} $$ where $( a(i) )$ is the average distance from the i-th point to the other points in the same cluster, and $( b(i) )$ is the minimum average distance to points in a different cluster.

Elbow Method¶

Plots the within-cluster sum of squares against the number of clusters. The "elbow" point indicates the optimal number of clusters.

Davies-Bouldin Index¶

Computes the average similarity ratio between each cluster and its most similar cluster. Lower values indicate better clustering. $$ DB = \frac{1}{N} \sum_{i=1}^{N} \max_{j \ne i} \left( \frac{s_i + s_j}{d_{ij}} \right) $$ where $( s_i )$ and $( s_j )$ are the average distances within cluster $( i )$ and $( j )$ respectively, $( j )$, and $( d_{ij} )$ is the distance between the centroids of clusters $( i )$ and $( j )$.

Dunn Index¶

Measures the ratio between the smallest distance among points in different clusters and the largest intra-cluster distance. Higher values indicate better clustering. $$ DI = \frac{\min_{1 \leq i < j \leq c} \delta(C_i, C_j)}{\max_{1 \leq k \leq c} \Delta(C_k)} $$ where $( \delta(C_i, C_j) )$ is the inter-cluster distance between clusters $( C_i )$ and $( C_j )$, and $( \Delta(C_k) )$ is the intra-cluster distance of cluster $( C_k )$.

Real-World Applications of Cluster Analysis¶

Cluster analysis is widely used across various industries to drive business success:

Retail: Retailers use clustering to segment customers based on purchasing behavior, enabling personalized marketing and promotions.
Healthcare: Clustering helps in identifying patient groups with similar health conditions for targeted treatment plans and resource allocation.
Finance: Financial institutions use clustering for customer segmentation, fraud detection, and risk management.
Marketing: Marketers leverage clustering to create targeted campaigns, understand customer preferences, and optimize marketing strategies.
Telecommunications: Telecom companies use clustering to identify usage patterns, reduce churn, and develop personalized service plans.

Conclusion¶

Cluster analysis is a powerful tool for uncovering hidden patterns and structures within complex datasets. By grouping similar data points together, businesses can gain valuable insights into customer behavior, optimize marketing strategies, and drive growth. Whether you're in retail, healthcare, finance, or any other industry, leveraging cluster analysis can help you make data-driven decisions and stay ahead of the competition.

Embrace the power of cluster analysis and unlock the full potential of your data. Start transforming your business insights today!

By integrating these elements into your marketing analytics strategy, you can effectively segment your audience, tailor your campaigns, and achieve higher ROI. For further reading and resources, explore advanced clustering techniques and their applications in various business contexts.

Cluster Analysis with GeoBizML¶

Import clusterAnalysis from geobizml¶

In [17]:

from geobizml import clusterAnalysis as ca

Lets create a demo data¶

In [18]:

import pandas as pd
import random
import numpy as np
import geopandas as gpd

In [19]:

# Generate random points within a bounding box
np.random.seed(42)
num_points = 100
min_lon, max_lon = -122.70, -122.60
min_lat, max_lat = 45.50, 45.60

# Define the bounding box for a fictional town (latitude, longitude)
town_area = {
    'min_lat': 45.5,
    'max_lat': 45.6,
    'min_lon': -122.7,
    'max_lon': -122.6
}

# Generate random points with values within the town area
def generate_random_points_with_values(num_points, area):
    points = []
    for _ in range(num_points):
        lat = random.uniform(area['min_lat'], area['max_lat'])
        lon = random.uniform(area['min_lon'], area['max_lon'])
        value = random.uniform(0, 100)  # Random value for demonstration
        points.append({'lon': lon, 'lat': lat, 'value': value})
    return points

# Generate 100 random points with values within the town area
num_points = 100
points = generate_random_points_with_values(num_points, town_area)

In [20]:

pt = pd.DataFrame(points)

In [21]:

pt.head()

Out[21]:

	lon	lat	value
0	-122.676043	45.599382	12.014330
1	-122.627892	45.507520	45.577596
2	-122.601732	45.509102	36.951152
3	-122.654158	45.528881	86.352023
4	-122.695080	45.566494	60.706180

Prepare Dateset for Geo-spatial cluster analysis¶

In [22]:

X = np.array(list(zip(pt['lon'], pt['lat'])))

Determining optimum number of cluster¶

Function: plot_elbow(X, max_clusters=10)¶

Overview¶

The plot_elbow function in GeoBizML is used to visualize the "elbow" or "knee" point in a clustering analysis, specifically for the K-means clustering algorithm. This technique helps in determining the optimal number of clusters by plotting the within-cluster sum of squares (WCSS) against a range of cluster numbers. The point where the plot bends or forms an elbow shape indicates the optimal number of clusters for the given dataset.

Parameters¶

X: Input data array or DataFrame (n_samples, n_features) on which clustering will be performed.

max_clusters: Optional. Integer specifying the maximum number of clusters to consider for plotting the elbow curve. Default is 10.

Visualization¶

The plot_elbow function generates a plot that shows the within-cluster sum of squares (WCSS) against the number of clusters (k). The optimal number of clusters is typically identified at the "elbow" or "knee" point in the plot, where the rate of decrease in WCSS slows down, indicating diminishing returns from adding more clusters.

Interpretation¶

Elbow Point: The optimal number of clusters is often determined at the point where adding more clusters does not significantly reduce the WCSS. This point represents a balance between model complexity (number of clusters) and clustering performance (compactness of clusters).

Conclusion¶

With the plot_elbow function in GeoBizML, you can effectively determine the optimal number of clusters for your data using the K-means clustering algorithm. This helps in making informed decisions about clustering model selection and application to geospatial data analysis tasks.

In [23]:

ca.plot_elbow(X, max_clusters=10)

No description has been provided for this image

Function: plot_scree(X, max_clusters=10)¶

Overview¶

The plot_scree function in GeoBizML is used to generate a scree plot, which helps in determining the optimal number of clusters or principal components by visualizing the explained variance or eigenvalues. This plot is particularly useful in clustering analysis and principal component analysis (PCA), providing insights into the variance explained by each cluster or principal component.

Parameters¶

X: Input data array or DataFrame (n_samples, n_features) on which clustering or PCA will be performed.

max_clusters: Optional. Integer specifying the maximum number of clusters or principal components to consider for plotting the scree curve. Default is 10.

Visualization¶

The plot_scree function generates a plot that shows the explained variance or eigenvalues against the number of clusters or principal components. In clustering analysis, this plot helps in identifying the optimal number of clusters based on the variance explained by each cluster. In PCA, it helps in determining the number of principal components that capture significant variance in the data.

Interpretation¶

Elbow Point: For clustering analysis, the optimal number of clusters is typically identified at the point where adding more clusters does not significantly increase the explained variance. Eigenvalues: In PCA, the scree plot shows the eigenvalues of each principal component, with significant eigenvalues indicating principal components that explain a large portion of the variance in the data.

Conclusion¶

With the plot_scree function in GeoBizML, you can visualize and interpret the explained variance or eigenvalues to determine the optimal number of clusters or principal components for your data analysis tasks. This helps in making informed decisions about model selection and application to geospatial data analysis.

In [24]:

ca.plot_scree(X, max_clusters=10)

Function: plot_silhouette(X, max_clusters=10)¶

Overview¶

The plot_silhouette function in GeoBizML generates a silhouette plot to evaluate the quality of clustering in terms of how well each data point fits into its assigned cluster. Silhouette analysis measures how similar each point is to its own cluster compared to other clusters, providing insights into cluster cohesion and separation. This plot is useful for assessing the appropriateness of the chosen number of clusters in clustering analysis.

Parameters¶

X: Input data array or DataFrame (n_samples, n_features) on which clustering will be performed.

max_clusters: Optional. Integer specifying the maximum number of clusters to consider for plotting the silhouette plot. Default is 10.

Visualization¶

The plot_silhouette function generates a plot that displays the silhouette coefficient for each sample, indicating how well each data point fits into its assigned cluster. The silhouette coefficient ranges from -1 to 1:

Values close to +1 indicate that the sample is well-clustered. Values close to 0 indicate overlapping clusters. Values close to -1 indicate that the sample may be assigned to the wrong cluster. Interpretation Optimal Clusters: The optimal number of clusters is often identified by maximizing the average silhouette coefficient across all samples. This indicates a good balance between cluster cohesion and separation.

Conclusion¶

With the plot_silhouette function in GeoBizML, you can visually assess the quality of clustering solutions and determine the optimal number of clusters for your geospatial data analysis tasks. Silhouette analysis helps in understanding how well-defined and distinct the clusters are, guiding decisions on clustering model selection and parameter tuning.

In [25]:

ca.plot_silhouette(X, max_clusters=10)

Function: plot_davies_bouldin(X, max_clusters=10)¶

Overview¶

The plot_davies_bouldin function in GeoBizML generates a Davies-Bouldin index plot to evaluate the quality of clustering by measuring the average similarity between each cluster and its most similar cluster. This index provides a way to assess cluster separation and cohesion, helping in the determination of the optimal number of clusters for clustering analysis.

Parameters¶

X: Input data array or DataFrame (n_samples, n_features) on which clustering will be performed.

max_clusters: Optional. Integer specifying the maximum number of clusters to consider for plotting the Davies-Bouldin index. Default is 10.

Visualization¶

The plot_davies_bouldin function generates a plot that displays the Davies-Bouldin index for each number of clusters considered. The Davies-Bouldin index is computed as the average similarity measure between each cluster and its most similar cluster:

Lower values of the Davies-Bouldin index indicate better clustering, with well-separated and compact clusters.

Interpretation¶

Optimal Clusters: The optimal number of clusters is often identified by minimizing the Davies-Bouldin index. This indicates clusters that are both well-separated and internally cohesive.

Conclusion¶

With the plot_davies_bouldin function in GeoBizML, you can assess the quality of clustering solutions and determine the optimal number of clusters for your geospatial data analysis tasks. The Davies-Bouldin index provides valuable insights into cluster separation and cohesion, aiding in informed decisions on clustering model selection and parameter tuning.

In [26]:

ca.plot_davies_bouldin(X, max_clusters=10)

Lets do cluster analysis with geobizml¶

Function: generate_clustering_map_html(pt, n_clusters, radius=3, color='red')¶

Overview¶

The generate_clustering_map_html function in GeoBizML performs cluster analysis and generates an interactive HTML map with clustered points displayed using custom markers. This function utilizes clustering algorithms to group geographic points (pt) into n_clusters clusters, and visualizes the results on a map with customizable marker properties.

Parameters¶

pt: DataFrame or GeoDataFrame containing geospatial points with latitude (lat) and longitude (lon) coordinates.

n_clusters: Integer specifying the number of clusters to generate.

radius: Optional. Float specifying the radius of the markers in the generated HTML map. Default is 3.

color: Optional. String specifying the color of the markers in the generated HTML map. Default is 'red'.

Output¶

The generate_clustering_map_html function outputs an interactive HTML map (html_map) displaying clustered points using custom markers. Each cluster is represented by markers with specified radius (radius) and color (color), providing a visual representation of geographical clustering.

Visualization¶

The generated HTML map allows interactive exploration of clustered points, enabling users to view cluster distribution and spatial patterns directly in a web browser. Customizable marker properties enhance the clarity and visual appeal of the cluster analysis results.

Conclusion¶

With the generate_clustering_map_html function in GeoBizML, you can perform cluster analysis on geospatial data and create interactive visualizations effortlessly. This function facilitates insightful exploration of cluster distributions and spatial relationships, supporting informed decision-making in geospatial analysis tasks.

In [27]:

n_clusters = 4

In [28]:

ca.generate_clustering_map_html(pt, n_clusters, radius=3, color='red')

Out[28]:

GeoBizML represents a powerful toolkit designed to empower geospatial analysts, data scientists, and researchers with advanced tools for geospatial data visualization, clustering analysis, and machine learning. By leveraging state-of-the-art algorithms and intuitive visualizations, GeoBizML enables users to unlock actionable insights from their geographic data effortlessly.

Whether you're exploring spatial patterns, conducting cluster analysis, or developing predictive models, GeoBizML streamlines complex workflows and enhances decision-making processes. Its user-friendly interface, coupled with comprehensive documentation and a supportive community, ensures that users can quickly integrate advanced geospatial analytics into their projects.

Explore the potential of GeoBizML today and discover new dimensions in geospatial analysis and visualization. Empower your data-driven initiatives with GeoBizML and transform how you interpret and utilize geographic data.

GeoBizML: Geospatial Data Visualization, Analysis, and Machine Learning¶

Overview¶

Key Features¶

Tools and Libraries¶

Installation¶

Data Visualization with geobizml¶

DataFrame must contains columns 'lon', 'lat', 'value' (optional)¶

1. lat (Latitude)¶

2. lon (Longitude)¶

convert to dictionary¶

Function: plot_points_with_dot_markers(points_with_values, dot_size=5, dot_color='blue', hue_column=None)¶

Overview¶

Creating a Point Plot¶

Creating a Point Plot with Hue¶

Function: plot_points_bubble(points_with_values, dot_size=5, dot_color='red', hue_column='value', weight=0.2)¶

Overview¶

Parameters¶

Unlocking Business Potential with Cluster Analysis¶

What is Cluster Analysis?¶

Key Terminologies¶

Key Techniques in Cluster Analysis¶

1. K-means Clustering¶

2. Hierarchical Clustering¶

3. Density-Based Clustering (DBSCAN)¶

4. Gaussian Mixture Models (GMM)¶

Evaluating Cluster Quality¶

Silhouette Score¶

Elbow Method¶

Davies-Bouldin Index¶

Dunn Index¶

Real-World Applications of Cluster Analysis¶

Conclusion¶

Cluster Analysis with GeoBizML¶

Import clusterAnalysis from geobizml¶

Lets create a demo data¶

Prepare Dateset for Geo-spatial cluster analysis¶

Determining optimum number of cluster¶

Function: plot_elbow(X, max_clusters=10)¶

Overview¶

Parameters¶

Visualization¶

Interpretation¶

Conclusion¶

Function: plot_scree(X, max_clusters=10)¶

Overview¶

Parameters¶

Visualization¶

Interpretation¶

Conclusion¶

Function: plot_silhouette(X, max_clusters=10)¶

Overview¶

Parameters¶

Visualization¶

Conclusion¶

Function: plot_davies_bouldin(X, max_clusters=10)¶

Overview¶

Parameters¶

Visualization¶

Interpretation¶

Conclusion¶

Lets do cluster analysis with geobizml¶

Function: generate_clustering_map_html(pt, n_clusters, radius=3, color='red')¶

Overview¶

Parameters¶

Output¶

Visualization¶

Conclusion¶

1. `lat` (Latitude)¶