Technical deep dive: What is the Altinity ClickHouse Operator for Kubernetes
Learn about the Altinity ClickHouse Operator for Kubernetes, what it does, and when to use it.
In this post, we’ll do a deep dive into the Altinity ClickHouse Operator, what it does, the ideal scenarios for its use, and illustrate its operation with detailed examples.
What is the Altinity ClickHouse Operator?
An operator in the Kubernetes ecosystem refers to a software extension incorporating domain-specific knowledge into the Kubernetes API. This facilitates the automation of complex tasks, enabling Kubernetes to handle them natively.
The Altinity ClickHouse Operator ("ClickHouse Operator"), in particular, is a Kubernetes operator specifically designed to manage ClickHouse, an open-source column-oriented database management system. ClickHouse stands out for its ability to generate real-time analytical data reports. Running ClickHouse on Kubernetes using the ClickHouse Operator brings enhanced scalability, resilience, and orchestration capabilities, making it a preferred choice for managing large-scale analytical workloads.
Why run ClickHouse with Kubernetes?
Combining the power of ClickHouse with the robustness of Kubernetes yields several significant advantages.
Scalability: Kubernetes' inherent ability to scale resources as per workload needs aligns perfectly with ClickHouse's horizontal scalability. This combination ensures that your ClickHouse deployment is always equipped to handle varying data loads.
High Availability: Kubernetes ensures high availability of applications running on it. This means your ClickHouse databases are always up and running, minimizing downtime.
Efficient Resource Management: Kubernetes allows efficient use of hardware resources, ensuring that your ClickHouse deployment utilizes the available resources optimally.
Orchestration: Kubernetes takes care of the orchestration of your ClickHouse deployment, automating tasks such as rollouts, rollbacks, and service discovery.
Features of the ClickHouse Operator
The ClickHouse Operator is equipped with features to simplify the management of ClickHouse in a Kubernetes environment. These include:
Automated cluster provisioning: The ClickHouse Operator can automatically provision new ClickHouse clusters with a single command. This means you can set up complex distributed databases with minimal manual intervention.
Scaling: With the ClickHouse Operator, you can easily scale your ClickHouse clusters up or down. This is crucial for maintaining performance during peak load times and conserving resources during low-traffic periods.
Monitoring: The ClickHouse Operator integrates seamlessly with monitoring tools, providing real-time insights into your ClickHouse clusters' performance. This allows for proactive problem detection and resolution.
Updates and Upgrades: The ClickHouse Operator handles updates and upgrades to your ClickHouse clusters, ensuring they're always running the latest and most secure version of ClickHouse.
When to use the ClickHouse Operator?
The ClickHouse Operator is useful for running ClickHouse in a Kubernetes environment. It is particularly beneficial when managing multiple ClickHouse clusters or scaling your clusters regularly based on traffic patterns. Moreover, if you require a high level of automation for cluster management tasks such as updates and monitoring, the ClickHouse Operator is a perfect choice.
If you choose not to use the ClickHouse Operator, managing ClickHouse on Kubernetes will require manual intervention. You must manually create and manage the Kubernetes resources, such as StatefulSets, Services, and PersistentVolumeClaims. Scaling, monitoring, and upgrading your ClickHouse clusters must also be handled manually or with custom scripts. Without the operator, you can also not provision new ClickHouse clusters automatically. Therefore, while running ClickHouse on Kubernetes without the operator is possible, it will require significantly more effort and expertise in Kubernetes administration.
How to install the ClickHouse Operator
Before you can utilize the ClickHouse Operator, you need to install it in your Kubernetes environment. Here is a step-by-step guide on how to achieve this using two different approaches.
Prerequisites
Before you begin, ensure that you have the following:
A Kubernetes cluster compatible with the ClickHouse Operator version you intend to install. Versions before 0.16.0 are compatible with Kubernetes 1.16 and prior to 1.22. Versions 0.16.0 and after are compatible with Kubernetes 1.16 and later.
Properly configured kubectl.
Curl.
Installation via kubectl
The operator installation process is straightforward and involves deploying the ClickHouse Operator using its manifest directly from the GitHub repo:
kubectl apply -f <https://raw.githubusercontent.com/Altinity/clickhouse-operator/master/deploy/operator/clickhouse-operator-install-bundle.yaml>
You should get the following results:
customresourcedefinition.apiextensions.k8s.io/clickhouseinstallations.clickhouse.altinity.com created
serviceaccount/clickhouse-operator created
clusterrolebinding.rbac.authorization.k8s.io/clickhouse-operator created
deployment.apps/clickhouse-operator configured
Verify the operator is running and is deployed in the kube-system namespace:
kubectl get pods --namespace kube-system
You should see the ClickHouse Operator running:
NAME READY STATUS RESTARTS AGE
...
clickhouse-operator-5c46dfc7bd-7cz5l 1/1 Running 0 43m
...
Installation via Helm
Starting with version 0.20.1, an official ClickHouse Operator Helm chart is also available.
For installation:
helm repo add clickhouse-operator
helm install clickhouse-operator clickhouse-operator/altinity-clickhouse-operator
For upgrade:
helm repo upgrade clickhouse-operator
helm upgrade clickhouse-operator clickhouse-operator/altinity-clickhouse-operator
For more details, see the official Helm chart for ClickHouse Operator.
Resources description
As part of the installation, several resources are created, including:
Custom Resource Definition (CRD): This extends the Kubernetes API with a new kind, the ClickHouseInstallation. It allows you to manage Kubernetes resources of this kind.
Service Account: This provides an identity for the ClickHouse Operator to interact with the Kubernetes API. It's authenticated as the clickhouse-operator service account.
Cluster Role Binding: This grants permissions defined in a role to a set of users. In this case, the cluster-admin role is granted to the clickhouse-operator service account, thereby giving it permissions across the cluster.
Deployment: This deploys the ClickHouse Operator itself.
Now that you've installed the ClickHouse Operator, you can begin to use it to manage your ClickHouse instances in Kubernetes.
How does the ClickHouse Operator work? – Deploy a cluster
Once the operator is installed, you can create a new ClickHouse cluster by applying a YAML file describing your cluster's desired state.
Below is an example of a simple ClickHouse cluster definition:
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "my-cluster"
spec:
configuration:
users:
# printf 'not_for_production' | sha256sum
test_user/password_sha256_hex: 5558ca2004d28646f75b348a8382069365a9e39a780db62423ce33e861d9c7767
# to allow access outside from kubernetes
test_user/networks/ip:
- 0.0.0.0/0
clusters:
- name: "my-cluster"
layout:
shardsCount: 1
In this example, we're creating a new ClickHouse cluster named, my-cluster with a single shard. The operator will create the necessary resources, such as pods and services, to bring up the ClickHouse cluster. To create the cluster, we need to apply the manifest using kubectl. Copy the cluster definition above and save it locally to my-custer.yaml. It’s common practice to run your components in a dedicated namespace. For this example, we’ll create a namespace called clickhouse-test. To create the namespace, run the following command.
kubectl create namespace clickhouse-test
You should see the following:
namespace/clickhouse-test created
Now, let’s deploy our cluster using the yaml file we created.
kubectl apply -n clickhouse-test -f my-cluster.yaml
Expected output:
clickhouseinstallation.clickhouse.altinity.com/my-cluster created
Check that the cluster has been created and is running
kubectl get pods -n clickhouse-test
You should see that there is a pod in the “Running” state
NAME READY STATUS RESTARTS AGE
chi-my-cluster-my-cluster-0-0-0 1/1 Running 0 2m36s
Check the services created by the operator:
kubectl get service -n clickhouse-test
We should see the following services up and running:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
chi-my-cluster-my-cluster-0-0-0 ClusterIP None <none> 8123/TCP,9000/TCP,9009/TCP 4m20s
clickhouse-my-cluster LoadBalancer 10.100.63.177 <redacted>.us-east-2.elb.amazonaws.com 8123:30655/TCP,9000:30812/TCP 4m51s
To interact with the cluster internally, you can execute the clickhouse-client command on the pod:
kubectl -n clickhouse-test exec -it chi-my-cluster-my-cluster-0-0-0 -- clickhouse-client
This command opens a ClickHouse client session connected to the ClickHouse server running in the chi-my-cluster-my-cluster-0-0-0 pod.
You can also access the cluster via the EXTERNAL-IP reported above (.us-east-2.elb.amazonaws.com)
clickhouse-client -h <redacted>.us-east-2.elb.amazonaws.com -u test_user --password not_for_production
You can run SQL queries directly from this prompt. For example, to get the version of ClickHouse you are running, you could use:
SELECT version()
Advanced ClickHouse Operator examples
The ClickHouse Operator allows for a high degree of customization and more complex configurations. Here's an example of a ClickHouse cluster with an encrypted, resizable AWS GP3 volume, one shard, and three replicas:
#
# AWS resizable disk example
#
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-resizable-enc
provisioner: ebs.csi.aws.com
parameters:
encrypted: 'true'
fsType: ext4
iops: '3000'
type: gp3
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "pv-resize-enc"
spec:
defaults:
templates:
dataVolumeClaimTemplate: data-volumeclaim-template
configuration:
zookeeper:
nodes:
- host: zookeeper.zoo1ns
clusters:
- name: "pv-resize-enc"
layout:
shardsCount: 1
replicasCount: 3
templates:
volumeClaimTemplates:
- name: data-volumeclaim-template
spec:
storageClassName: gp3-resizable-enc
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
In this example, we're creating a new ClickHouse cluster named pv-resize-enc with an encrypted AWS GP3 EBS volume, one shard, and three replicas. This type of configuration is beneficial for larger databases, where data is distributed across multiple shards for improved query performance, and replicas are used for redundancy and failover.
You can interact with the cluster in the same way as before, using the clickhouse-client command. Remember that you need to specify the correct pod name, which will depend on the number of shards and replicas in your cluster.
This advanced usage of the ClickHouse Operator showcases its flexibility in managing complex ClickHouse configurations, making it an indispensable tool for any data professional working with ClickHouse on Kubernetes. See the ClickHouse Operator Github repo for examples of additional configurations.
ClickHouse Operator configuration
The ClickHouse Operator has several settings and parameters that control its behavior and the configuration of ClickHouse clusters:
Operator Settings: These settings dictate the behavior of the operator itself. They are initialized from three sources (in order):
/etc/clickhouse-operator/config.yaml
, theetc-clickhouse-operator-files configmap
, and theClickHouseOperatorConfiguration
resource. Changes to these settings are monitored and applied immediately.ClickHouse Common Configuration Files: These are ready-to-use XML files with sections of ClickHouse configuration. They typically contain general ClickHouse configuration sections, such as network listen endpoints, logger options, etc. They are exposed via config maps.
ClickHouse User Configuration Files: These are ready-to-use XML files with sections of ClickHouse configuration. They typically contain ClickHouse configuration sections with user account specifications. They are exposed via config maps as well.
ClickHouseOperatorConfiguration Resource: This is a Kubernetes custom resource that provides the ClickHouse Operator with its configuration.
ClickHouseInstallationTemplates: The operator provides functionality to specify parts of the ClickHouseInstallation manifest as a set of templates, which are used in all ClickHouseInstallations.
Some specific settings include:
Watch Namespaces: This setting allows you to specify the namespaces where the ClickHouse Operator watches for events. Multiple operators running concurrently should watch different namespaces.
Additional Configuration Files: These settings allow you to specify the paths to folders containing various configuration files for ClickHouse.
Cluster Create/Update/Delete Objects: These settings control the operator's behavior when creating, updating, or deleting Kubernetes objects, such as StatefulSets for ClickHouse clusters.
ClickHouse Settings: These settings allow you to specify default values for ClickHouse user configurations, such as user profile, quota, network IP, and password.
Operator's access to ClickHouse instances: These settings allow you to specify the ClickHouse credentials (username, password, and port) to be used by the operator to connect to ClickHouse instances for metrics requests, schema maintenance, and DROP DNS CACHE.
All these settings and parameters make the ClickHouse Operator a highly configurable tool for managing ClickHouse in a Kubernetes environment.
1. Operator Settings: For example, you might have the following settings in your /etc/clickhouse-operator/config.yaml
:
watchNamespaces:
- namespace1
- namespace2
2. ClickHouse Common Configuration Files: An example configuration file might look like this:
<yandex>
<logger>
<console>1</console>
<level>trace</level>
</logger>
</yandex>
3. ClickHouse User Configuration Files: An example configuration file might look like this:
<yandex>
<users>
<default>
<password>example_password</password>
<networks>
<ip>::/0</ip>
</networks>
<profile>default</profile>
<quota>default</quota>
</default>
</users>
</yandex>
4. ClickHouseOperatorConfiguration Resource: An example resource might look like this:
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseOperatorConfiguration"
metadata:
name: "clickhouse-operator-configuration"
spec:
watchNamespaces:
- namespace1
- namespace2
5. ClickHouseInstallationTemplates: An example template might look like this:
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseOperatorConfiguration"
metadata:
name: "clickhouse-operator-configuration"
spec:
watchNamespaces:
- namespace1
- namespace2
Specific settings:
- Watch Namespaces: For example, you might specify the following namespaces:
watchNamespaces:
- namespace1
- namespace2
- Additional Configuration Files: For example, you might specify the following paths:
additionalConfigurationFiles:
- /etc/clickhouse-operator/config.yaml
- Cluster Create/Update/Delete Objects: For example, you might specify the following behavior:
clusterCreateObjects:
- statefulSet
clusterUpdateObjects:
- statefulSet
clusterDeleteObjects:
- statefulSet
- ClickHouse Settings: For example, you might specify the following default values:
clickhouse:
user:
default:
profile: default
quota: default
networks:
ip: "::/0"
password: example_password
- Operator's access to ClickHouse instances: For example, you might specify the following credentials:
operatorCredentials:
username: operator
password: example_password
port: 9000
Security hardening for the ClickHouse Operator
The ClickHouse Operator's security model provides a solid foundation for maintaining a secure environment for your ClickHouse instances on Kubernetes. However, it's essential to understand the model and apply the necessary hardening measures to ensure maximum protection against potential threats.
The default ClickHouse Operator deployment comes with two users, default and clickhouse_operator, both of which are shielded by network restriction rules barring unauthorized access.
Securing the 'default' user
The default user connects to the ClickHouse instance from the pod where it's running and is used for distributed queries. By default, this user has no password. To secure the default user, the operator applies network security rules that restrict connections to the pods running the ClickHouse cluster.
Securing the 'clickhouse_operator' user
The clickhouse_operator user is used by the operator itself to perform DML operations when adding or removing ClickHouse replicas and shards and to collect monitoring data. The user and password values are stored in a secret. It is recommended not to include these credentials directly in the operator configuration without a secret.
To change the clickhouse_operator user password, you can modify the etc-clickhouse-operator-files config map or create a ClickHouseOperatorConfiguration object. The operator restricts access to this user using an IP mask.
Securing additional ClickHouse users
For additional ClickHouse users created using SQL CREATE USER statement or in a dedicated section of ClickHouseInstallation, you should ensure that passwords are not exposed. User passwords can be specified in plaintext or in SHA256 (hex format).
However, specifying passwords in plain text is not recommended, even though the operator hashes it when deploying to ClickHouse. It is advisable to provide hashes explicitly as follows:
spec:
useTemplates:
- name: clickhouse-version
configuration:
users:
# printf 'sha256_this_string' | sha256sum
user1/password_sha256_hex: e7daa9d211e06ccfed5bcdc8a8cec8ef6b38bd59a5efbfa8bc2a847dfe08ed47
For enhanced security, the operator supports reading passwords and password hashes from a secret as follows:
spec:
configuration:
users:
user1/k8s_secret_password: clickhouse-secret/pwduser1
user2/k8s_secret_password_sha256_hex: clickhouse-secret/pwduser2
user3/k8s_secret_password_double_sha1_hex: clickhouse-secret/pwduser3
Adhering to these guidelines ensures that your ClickHouse Operator and the ClickHouse instances it manages are secure and protected against unauthorized access.
Please refer to the official ClickHouse operator hardening guide for detailed instructions and examples.
Build faster with Propel: A Serverless Clickhouse for developers
At Propel, we offer a fully managed ClickHouse service that allows you to focus more on drawing insights from your data and less on infrastructure management. Propel provides data-serving APIs, React components, and built-in multi-tenant access controls, making it easier and faster for you to build data-intensive applications.
You can connect your own ClickHouse with Propel, whether it's self-hosted or on the ClickHouse Cloud, or take advantage of our fully managed serverless cloud.
Connect your own ClickHouse
✅ Works with self-hosted ClickHouse.
✅ Works with ClickHouse Cloud.
✅ Full control of your ClickHouse deployment.
✅ Data stays in your infrastructure.
✅ Data is encrypted in transit.
✅ Only pay for data passthrough.
Fully managed serverless cloud
✅ Ingest data from any data source.
✅ No infrastructure to manage or scale.
✅ Serverless auto-scaling.
✅ Mission-critical availability.
✅ Unlimited storage.
✅ Data is encrypted at rest and in transit.
✅ Only pay for the storage you use and queries you make.
Conclusion
The ClickHouse Operator is a powerful tool for managing ClickHouse in a Kubernetes environment. It offers a suite of features like automated cluster provisioning, scaling, monitoring, and automatic backups. Whether you're running a single ClickHouse instance or managing a fleet of ClickHouse clusters, the ClickHouse Operator can simplify your operations and make your life easier.
For more information on how to use the ClickHouse Operator, be sure to check out the official ClickHouse Operator documentation.
Further reading
For more insights on how to use ClickHouse for your data operations, check out our other posts. We cover a wide range of topics, from advanced querying techniques to performance tuning for large datasets. Whether you're a beginner or an experienced data professional, there's always something new to learn!