- Overview
- Getting Started
- Install
- Quickstart
- Configuration
- Ingest Examples
- Query Examples
- MCP
- Agent Quickstart
- Agent Instructions
- Where to Find Apps
- Troubleshooting
- Overview
- Workspaces & Packages
- Data Contracts
- Data Modules
- Contract Mappings
- Detection Families
- Detections
- Detection Imports
- App Bundles
- MCP Server
- Smoke Tests
- Command Reference
- Architecture
- Docs Scope
- One vs Enterprise
- Quickstart Guide
- Stores and Store Routes
- AWS Marketplace
- AKS Cluster Configuration
- Initialize Mach5 Search in AKS
- EKS Cluster Configuration
- GKE Cluster Configuration
- Initialize Mach5 Search in GKE
- Helm Charts
- Local Kubernetes
- License Token
- Node to Pod Mapping
- Transform Functions
- Kafka
- S3
- Iceberg
- Operation Modes
- Synthetic Data Generation
- SQL Support
- Materialized Views
- Materialized Views v1 (Deprecated)
- KQL Support
- KQL Client
- Authentication and Authorization
- Keycloak Admin
- Role Patterns
- Authenticated API Access
- Index Metadata Prefetch
- Custom OTEL to AWS CloudWatch
- Axon Concepts
- Axon Syntax
- App Concepts
- App Examples
- Bundle Concepts
- Bundle Examples
- Notebook Concepts
- Notebook Examples
- Sample Data and Visualizations
- Dashboard Concepts
- Dashboard Configuration
- GitHub
- Slack
- Okta
- Amazon S3
- Google Cloud Storage
- Azure Blob Storage
- IMAP
- POP3
- SMTP
- Databricks
- Trino OpenSearch Connector
- Cloudflare Data Platform
- Rook-Ceph
- OpenSearch APIs
- Support
Mach5 One
m5c
Overview
Getting Started
Mach5 Enterprise
Both Editions
Ingestion
Querying
Security
Enterprise Operations
Workflows
Declarative Apps
App Bundles
Notebooks
Dashboards
Integrations
Reference
Support
#Using Databricks with Mach5 Search
Mach5 Search is a cloud-native search and indexing engine optimized for large-scale machine data and log analytics. When integrated with Databricks and Delta Lake, it allows users to run full-text search and aggregations over structured and semi-structured data stored in Delta format, without duplicating the underlying dataset.
Key advantages include:
- Native support for object storage and multiple data lakes
- Separation of indexing and serving workloads
- ElasticSearch-compatible query APIs
- 90% lower TCO compared to traditional ELK stacks
Prerequisites
To integrate Mach5 Search with Databricks, ensure the following:
- A Databricks workspace with Delta Lake tables available
- Access to a supported cloud object store (e.g., AWS S3, Azure Blob, GCS)
- Permissions to run Databricks jobs or notebooks
- A deployed Mach5 Search instance (hosted or self-managed
- Kibana or notebook interface for querying (optional)
Component Overview
- Delta Lake(Databricks) :Primary storage for structured event/log data.
- Mach5 Indexer :Runs in Databricks to convert Delta tables into search indexes.
- Object Storage :Stores the inverted indexes produced by the indexer.
- Mach5 Search :Stateless search layer that loads and serves indexes.
- Query Interface :Kibana dashboards, notebooks, API clients.
Deployment Architecture

- indexing is decoupled from search querying.
- Indexes are durable and scalable via object storage
- Stateless architecture allows elastic scaling of search nodes
Data Flow
- Raw data is ingested into Delta Lake tables in Databricks
- Mach5 Indexer runs as a job or notebook and reads from these tables
- Indexes are written to object storage in optimized format
- Mach5 Search loads indexes and serves queries
- Queries can be executed via Elastic DSL or API clients
Running Mach5 Indexer in Databricks
Execution Context
- Can be triggered manually, on a schedule, or via table updates
- Configurable via JSON/YAML or programmatically in notebooks
Example (Python Pseudo-code)
from Mach5 import indexer:
indexer = Indexer(
input_path="dbfs:/mnt/logs/delta",
output_path="s3://mach5-indexes/",
config={"fields": ["timestamp", "message", "host", "severity"]})
indexer.run()
- Supports field-level configuration and schema inference
- Fault-tolerant and restartable
Querying from Notebooks or Kibana
Mach5 exposes a REST API compatible with Elasticsearch, enabling:
- Real-time search from notebooks
- Visualizations via Kibana
- Integrations with SIEM and monitoring tools
Example (Python Pseudo-code)
POST /logs/_search
{
"query": {
"match": {
"message": "authentication failure"
}
},
"aggs": {
"by_host": {
"terms": { "field": "host" }
}
}
}
Ideal Mach5 usecases for Databricks
- Security analytics and alert investigations on Delta Lake logs
- Real-time observability and dashboarding over machine data
- SIEM augmentation without duplicating data into Elasticsearch
- Unified data pipelines using Spark + Mach5 for search at scale
- Cost-efficient operational search over petabyte-scale log volumes
Need help?
If you have any questions about setting up Mach5 Search and need live help, please email us at: info@mach5.io
Just getting started or exploring on your own? Join the conversation in the Mach5 Discord Community - a space to ask questions, share ideas, and learn from other engineers building with Mach5.
On this page
Need Help?
Our team of experts is ready to assist you with your integration.
Training Sessions
Get your team up to speed with personalized training.