feat(core): introduce typed ClickHouse table API, query builder, and result handling; enhance HTTP client and add schema evolution, batch inserts and mutations; update docs/tests and bump deps

This commit is contained in:
2026-02-27 10:17:32 +00:00
parent 26449e9171
commit aace102868
17 changed files with 7000 additions and 1886 deletions

496
readme.md
View File

@@ -1,256 +1,430 @@
# @push.rocks/smartclickhouse
A TypeScript-based ODM (Object-Document Mapper) for ClickHouse databases, with support for creating and managing tables and handling time-series data.
A TypeScript-based ODM for ClickHouse databases with full CRUD support, a fluent query builder, configurable engines, and automatic schema evolution.
## Issue Reporting and Security
For reporting bugs, issues, or security vulnerabilities, please visit [community.foss.global/](https://community.foss.global/). This is the central community hub for all issue reporting. Developers who sign and comply with our contribution agreement and go through identification can also get a [code.foss.global/](https://code.foss.global/) account to submit Pull Requests directly.
## Install
To install `@push.rocks/smartclickhouse`, use the following command with npm:
```sh
npm install @push.rocks/smartclickhouse --save
pnpm install @push.rocks/smartclickhouse
```
Or with yarn:
```sh
yarn add @push.rocks/smartclickhouse
```
This will add the package to your project's dependencies.
## Usage
`@push.rocks/smartclickhouse` is an advanced ODM (Object Document Mapper) module designed for seamless interaction with ClickHouse databases leveraging the capabilities of TypeScript for strong typing and enhanced developer experience. Below is a comprehensive guide to using the package in various scenarios.
### Setting Up and Starting the Connection
To begin using `@push.rocks/smartclickhouse`, you need to establish a connection with the ClickHouse database. This involves creating an instance of `SmartClickHouseDb` and starting it:
### 🔌 Connecting to ClickHouse
```typescript
import { SmartClickHouseDb } from '@push.rocks/smartclickhouse';
// Create a new instance of SmartClickHouseDb with your ClickHouse database details
const dbInstance = new SmartClickHouseDb({
url: 'http://localhost:8123', // URL of ClickHouse instance
database: 'yourDatabase', // Database name you want to connect to
username: 'default', // Optional: Username for authentication
password: 'password', // Optional: Password for authentication
unref: true // Optional: Allows service to exit while awaiting database startup
const db = new SmartClickHouseDb({
url: 'http://localhost:8123',
database: 'myDatabase',
username: 'default', // optional
password: 'secret', // optional
unref: true, // optional — allow process exit during startup
});
// Start the instance to establish the connection
await dbInstance.start();
await db.start(); // pings until available, creates database if needed
await db.start(true); // drops and recreates database (useful for test suites)
```
### Working with Time Data Tables
The library communicates with ClickHouse over its HTTP interface — no native protocol driver required.
`smartclickhouse` allows handling of time-series data through `TimeDataTable`, automating tasks such as table creation and data insertion.
---
#### Creating or Accessing a Table
### 📋 Creating a Typed Table
To create a new time data table or access an existing one:
Use `db.createTable<T>()` with full control over engine, ordering, partitioning, and TTL:
```typescript
const tableName = 'yourTimeDataTable'; // Name of the table you want to access or create
const table = await dbInstance.getTable(tableName);
```
#### Adding Data to the Table
Once you have the table instance, you can insert data into it:
```typescript
await table.addData({
timestamp: Date.now(), // Timestamp in milliseconds
message: 'A log message.', // Arbitrary data field
temperature: 22.5, // Another example field
tags: ['tag1', 'tag2'] // An example array field
});
```
The `addData` method is designed to be flexible, allowing insertion of various data types and automatically managing table schema adjustments.
### Advanced Usage and Custom Data Handling
`smartclickhouse` supports custom data types and complex data structures. For instance, to add support for nested objects or custom data processing before insertion, you might need to extend existing classes or customize the `addData` method to fit your needs.
#### Custom Data Processing
To handle complex data structures or to perform custom data processing before insertion, you might need to modify the `addData` method. Below is an example of extending the `SmartClickHouseDb` method:
```typescript
class CustomClickHouseDb extends SmartClickHouseDb {
public async addCustomData(tableName: string, data: any) {
const table = await this.getTable(tableName);
const customData = {
...data,
processedAt: Date.now(),
customField: 'customValue',
};
await table.addData(customData);
}
interface ILogEntry {
timestamp: number;
level: string;
message: string;
service: string;
duration: number;
}
const customDbInstance = new CustomClickHouseDb({
url: 'http://localhost:8123',
database: 'yourDatabase',
});
await customDbInstance.start();
await customDbInstance.addCustomData('customTable', {
message: 'Test message',
randomField: 123456,
const logs = await db.createTable<ILogEntry>({
tableName: 'logs',
orderBy: ['timestamp', 'service'],
partitionBy: "toYYYYMM(timestamp)",
columns: [
{ name: 'timestamp', type: "DateTime64(3, 'Europe/Berlin')" },
{ name: 'level', type: 'String' },
{ name: 'message', type: 'String' },
{ name: 'service', type: 'String' },
{ name: 'duration', type: 'Float64' },
],
ttl: { column: 'timestamp', interval: '90 DAY' },
});
```
### Bulk Data Insertion
#### ⚙️ Engine Configuration
`@push.rocks/smartclickhouse` supports efficient bulk data insertion mechanisms. This feature is useful when you need to insert a large amount of data in a single operation.
Supports the full MergeTree family:
| Engine | Use Case |
|---|---|
| `MergeTree` | Default — append-only, great for logs and events |
| `ReplacingMergeTree` | Upsert-style mutable data (deduplicates on `OPTIMIZE`) |
| `SummingMergeTree` | Pre-aggregated counters and metrics |
| `AggregatingMergeTree` | Materialized aggregate states |
| `CollapsingMergeTree` | Mutable rows via sign-based collapsing |
| `VersionedCollapsingMergeTree` | Versioned collapsing for concurrent updates |
```typescript
const bulkData = [
{ timestamp: Date.now(), message: 'Message 1', temperature: 20.1 },
{ timestamp: Date.now(), message: 'Message 2', temperature: 21.2 },
// Additional data entries...
];
// ReplacingMergeTree for upsert-style mutable data
const users = await db.createTable<IUser>({
tableName: 'users',
engine: { engine: 'ReplacingMergeTree', versionColumn: 'updatedAt' },
orderBy: 'userId',
});
await table.addData(bulkData);
// SummingMergeTree for pre-aggregated metrics
const metrics = await db.createTable<IMetric>({
tableName: 'metrics',
engine: { engine: 'SummingMergeTree' },
orderBy: ['date', 'metricName'],
});
```
### Querying Data
#### 🧬 Auto-Schema Evolution
Fetching data from the ClickHouse database includes operations such as retrieving the latest entries, entries within a specific timestamp range, or streaming new entries.
#### Retrieving the Last N Entries
To retrieve the last `N` number of entries:
When `autoSchemaEvolution` is enabled (default), new columns are created automatically from your data via `ALTER TABLE ADD COLUMN`:
```typescript
const latestEntries = await table.getLastEntries(10);
console.log('Latest Entries:', latestEntries);
const flexTable = await db.createTable<any>({
tableName: 'events',
orderBy: 'timestamp' as any,
autoSchemaEvolution: true,
});
// First insert creates the base schema
await flexTable.insert({ timestamp: Date.now(), message: 'hello' });
// New fields trigger ALTER TABLE ADD COLUMN automatically
await flexTable.insert({
timestamp: Date.now(),
message: 'world',
userId: 'u123', // → new String column
responseTime: 150.5, // → new Float64 column
tags: ['a', 'b'], // → new Array(String) column
});
```
#### Retrieving Entries Newer than a Specific Timestamp
Nested objects are automatically flattened (e.g. `{ deep: { field: 'value' } }` becomes column `deep_field`).
To retrieve entries that are newer than a specific timestamp:
---
### ✏️ Inserting Data
```typescript
const timestamp = Date.now() - 60000; // 1 minute ago
const newEntries = await table.getEntriesNewerThan(timestamp);
console.log('New Entries:', newEntries);
// Single row
await logs.insert({
timestamp: Date.now(),
level: 'info',
message: 'Request processed',
service: 'api',
duration: 42.5,
});
// Multiple rows
await logs.insertMany([
{ timestamp: Date.now(), level: 'info', message: 'msg1', service: 'api', duration: 10 },
{ timestamp: Date.now(), level: 'error', message: 'msg2', service: 'worker', duration: 500 },
]);
// Large batch with configurable chunk size
await logs.insertBatch(largeArray, { batchSize: 50000 });
```
#### Retrieving Entries Between Two Timestamps
#### 🌊 Streaming Inserts
To retrieve entries between two timestamps:
Use `createInsertStream()` for push-based insert buffering with automatic batch flushing:
```typescript
const startTimestamp = Date.now() - 120000; // 2 minutes ago
const endTimestamp = Date.now() - 5000; // 5 seconds ago
const entriesBetween = await table.getEntriesBetween(startTimestamp, endTimestamp);
console.log('Entries Between:', entriesBetween);
const stream = logs.createInsertStream({ batchSize: 100, flushIntervalMs: 1000 });
stream.push({ timestamp: Date.now(), level: 'info', message: 'event1', service: 'api', duration: 10 });
stream.push({ timestamp: Date.now(), level: 'info', message: 'event2', service: 'api', duration: 20 });
// Signal end-of-stream and wait for final flush
stream.signalComplete();
await stream.completed;
```
### Managing and Deleting Data
---
The module provides functionality for managing and deleting data within the ClickHouse database.
### 🔍 Querying with the Fluent Builder
#### Deleting Old Entries
You can delete entries older than a specified number of days:
The query builder provides type-safe, chainable query construction:
```typescript
// Ensure there are entries before deletion
let entries = await table.getLastEntries(1000);
console.log('Entries before deletion:', entries.length);
// Basic filtered query
const errors = await logs.query()
.where('level', '=', 'error')
.orderBy('timestamp', 'DESC')
.limit(100)
.toArray();
// Delete all entries older than now
await table.deleteOldEntries(0);
// Multiple conditions with AND / OR
const result = await logs.query()
.where('service', '=', 'api')
.and('duration', '>', 1000)
.and('level', 'IN', ['error', 'warn'])
.orderBy('timestamp', 'DESC')
.limit(50)
.toArray();
// Verify the entries are deleted
entries = await table.getLastEntries(1000);
console.log('Entries after deletion:', entries.length);
// OR conditions
const mixed = await logs.query()
.where('level', '=', 'error')
.or('duration', '>', 5000)
.toArray();
// Get first match
const latest = await logs.query()
.orderBy('timestamp', 'DESC')
.first();
// Count
const errorCount = await logs.query()
.where('level', '=', 'error')
.count();
// Pagination with limit/offset
const page2 = await logs.query()
.orderBy('timestamp', 'DESC')
.limit(20)
.offset(20)
.toArray();
// Aggregation with raw expressions
const stats = await logs.query()
.selectRaw('service', 'count() as requests', 'avg(duration) as avgDuration')
.groupBy('service')
.having('requests > 100')
.orderBy('requests' as any, 'DESC')
.toArray();
// Select specific columns
const names = await logs.query()
.select('service', 'level')
.limit(10)
.toArray();
// Raw WHERE expression for advanced use cases
const advanced = await logs.query()
.whereRaw("toHour(timestamp) BETWEEN 9 AND 17")
.toArray();
// Debug — inspect generated SQL without executing
console.log(logs.query().where('level', '=', 'error').limit(10).toSQL());
// → SELECT * FROM mydb.logs WHERE level = 'error' LIMIT 10 FORMAT JSONEachRow
```
#### Deleting the Entire Table
#### Supported Operators
To delete the entire table and all its data:
`=`, `!=`, `>`, `>=`, `<`, `<=`, `LIKE`, `NOT LIKE`, `IN`, `NOT IN`, `BETWEEN`
#### 📦 Result Sets
Use `.execute()` to get a `ClickhouseResultSet` with convenience methods:
```typescript
await table.delete();
const resultSet = await logs.query()
.orderBy('timestamp', 'DESC')
.limit(100)
.execute();
// Verify table deletion
const result = await dbInstance.clickhouseHttpClient.queryPromise(`
SHOW TABLES FROM ${dbInstance.options.database} LIKE '${table.options.tableName}'
`);
console.log('Table exists after deletion:', result.length === 0);
resultSet.isEmpty(); // boolean
resultSet.rowCount; // number
resultSet.first(); // T | null
resultSet.last(); // T | null
resultSet.map(r => r.service); // string[]
resultSet.filter(r => r.duration > 100); // ClickhouseResultSet<T>
resultSet.toObservable(); // RxJS Observable<T>
resultSet.toArray(); // T[]
```
### Observing Real-Time Data
---
To observe new entries in real-time, you can stream new data entries using the RxJS Observable:
### 🔄 Updating Data
Updates use ClickHouse mutations (`ALTER TABLE UPDATE`). The library automatically waits for mutations to complete.
> 💡 For frequently updated data, consider using `ReplacingMergeTree` instead — it's the idiomatic ClickHouse approach for mutable rows.
```typescript
const stream = table.watchNewEntries();
await logs.update(
{ level: 'warn' }, // SET clause
(q) => q.where('level', '=', 'warning'), // WHERE clause
);
```
const subscription = stream.subscribe((entry) => {
A WHERE clause is **required** — you can't accidentally update every row.
---
### 🗑️ Deleting Data
```typescript
// Targeted delete with builder
await logs.deleteWhere(
(q) => q.where('level', '=', 'debug').and('timestamp', '<', cutoffDate),
);
// Delete by age (interval syntax)
await logs.deleteOlderThan('timestamp', '30 DAY');
// Drop entire table
await logs.drop();
```
---
### 👀 Watching for New Data
Stream new entries via polling with an RxJS Observable:
```typescript
const subscription = logs.watch({ pollInterval: 2000 }).subscribe((entry) => {
console.log('New entry:', entry);
});
// Simulate adding new entries
let i = 0;
while (i < 10) {
await table.addData({
timestamp: Date.now(),
message: `streaming message ${i}`,
});
i++;
await new Promise((resolve) => setTimeout(resolve, 1000)); // Add a delay to simulate real-time data insertion
}
// Stop watching
subscription.unsubscribe();
```
This method allows continuous monitoring of data changes and integrating the collected data into other systems for real-time applications.
---
### Comprehensive Feature Set
### 🛠️ Utilities
While the examples provided cover the core functionalities of the `@push.rocks/smartclickhouse` module, it also offers a wide range of additional features, including:
```typescript
await logs.getRowCount(); // total row count
await logs.optimize(true); // OPTIMIZE TABLE FINAL (dedup for ReplacingMergeTree)
await logs.waitForMutations(); // wait for pending mutations to complete
await logs.updateColumns(); // refresh column metadata from system.columns
```
- **Error Handling and Reconnection Strategies**: Robust error handling mechanisms ensure your application remains reliable. Automatic reconnection strategies help maintain persistent connections with the ClickHouse database.
- **Materialized Views and MergeTree Engines**: Support for ClickHouse-specific features such as materialized views and aggregating MergeTree engines, enhancing the module's capabilities in handling large-scale data queries and management.
- **Efficient Data Handling**: Techniques for managing and querying large time-series datasets, providing optimal performance and reliability.
---
### Contribution
### 🔧 Raw Queries
Contributions to `@push.rocks/smartclickhouse` are welcome. Whether through submitting issues, proposing improvements, or adding to the codebase, your input is valuable. The project is designed to be open and accessible, striving for a high-quality, community-driven development process.
Execute arbitrary SQL directly on the database:
To contribute:
```typescript
const result = await db.query<{ total: string }>(
'SELECT count() as total FROM mydb.logs FORMAT JSONEachRow'
);
```
1. Fork the repository.
2. Create a new branch (`git checkout -b feature-branch`).
3. Commit your changes (`git commit -am 'Add some feature'`).
4. Push to the branch (`git push origin feature-branch`).
5. Create a new Pull Request.
---
The above scenarios cover the essential functionality and the more advanced use cases of `@push.rocks/smartclickhouse`, providing a comprehensive guide to utilizing the module into your projects. Happy coding!
### 🏛️ Backward Compatibility
The legacy `getTable()` API still works exactly as before. It returns a `TimeDataTable` pre-configured with MergeTree, timestamp ordering, auto-schema evolution, and TTL:
```typescript
const table = await db.getTable('analytics');
// Insert — accepts arbitrary JSON objects, auto-flattens nested fields
await table.addData({
timestamp: Date.now(),
message: 'hello',
nested: { field: 'value' }, // stored as column `nested_field`
});
// Query
const entries = await table.getLastEntries(10);
const recent = await table.getEntriesNewerThan(Date.now() - 60000);
const range = await table.getEntriesBetween(startMs, endMs);
// Delete
await table.deleteOldEntries(30); // remove entries older than 30 days
// Watch
table.watchNewEntries().subscribe(entry => console.log(entry));
// Drop
await table.delete();
```
You can also use the factory function directly:
```typescript
import { createTimeDataTable } from '@push.rocks/smartclickhouse';
const table = await createTimeDataTable(db, 'analytics', 90 /* retain days */);
```
---
### 🐳 Running ClickHouse Locally
```sh
docker run --name clickhouse-server \
--ulimit nofile=262144:262144 \
-p 8123:8123 -p 9000:9000 \
-e CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 \
clickhouse/clickhouse-server
```
The HTTP interface is available at `http://localhost:8123` with a playground at `http://localhost:8123/play`.
---
### 📚 Exported Types
The library exports all types for full TypeScript integration:
```typescript
import type {
TClickhouseColumnType, // String, UInt64, Float64, DateTime64, Array(...), etc.
TClickhouseEngine, // MergeTree family engine names
IEngineConfig, // Engine + version/sign column config
IClickhouseTableOptions, // Full table creation options
IColumnDefinition, // Column name + type + default + codec
IColumnInfo, // Column metadata from system.columns
TComparisonOperator, // =, !=, >, <, LIKE, IN, BETWEEN, etc.
} from '@push.rocks/smartclickhouse';
```
Utility functions are also exported:
```typescript
import { escapeClickhouseValue, detectClickhouseType } from '@push.rocks/smartclickhouse';
escapeClickhouseValue("O'Brien"); // → "'O\\'Brien'"
escapeClickhouseValue(42); // → '42'
escapeClickhouseValue(['a', 'b']); // → "('a', 'b')"
detectClickhouseType('hello'); // → 'String'
detectClickhouseType(3.14); // → 'Float64'
detectClickhouseType([1, 2]); // → 'Array(Float64)'
```
## License and Legal Information
This repository contains open-source code that is licensed under the MIT License. A copy of the MIT License can be found in the [license](license) file within this repository.
This repository contains open-source code licensed under the MIT License. A copy of the license can be found in the [LICENSE](./LICENSE) file.
**Please note:** The MIT License does not grant permission to use the trade names, trademarks, service marks, or product names of the project, except as required for reasonable and customary use in describing the origin of the work and reproducing the content of the NOTICE file.
### Trademarks
This project is owned and maintained by Task Venture Capital GmbH. The names and logos associated with Task Venture Capital GmbH and any related products or services are trademarks of Task Venture Capital GmbH and are not included within the scope of the MIT license granted herein. Use of these trademarks must comply with Task Venture Capital GmbH's Trademark Guidelines, and any usage must be approved in writing by Task Venture Capital GmbH.
This project is owned and maintained by Task Venture Capital GmbH. The names and logos associated with Task Venture Capital GmbH and any related products or services are trademarks of Task Venture Capital GmbH or third parties, and are not included within the scope of the MIT license granted herein.
Use of these trademarks must comply with Task Venture Capital GmbH's Trademark Guidelines or the guidelines of the respective third-party owners, and any usage must be approved in writing. Third-party trademarks used herein are the property of their respective owners and used only in a descriptive manner, e.g. for an implementation of an API or similar.
### Company Information
Task Venture Capital GmbH
Registered at District court Bremen HRB 35230 HB, Germany
Task Venture Capital GmbH
Registered at District Court Bremen HRB 35230 HB, Germany
For any legal inquiries or if you require further information, please contact us via email at hello@task.vc.
For any legal inquiries or further information, please contact us via email at hello@task.vc.
By using this repository, you acknowledge that you have read this section, agree to comply with its terms, and understand that the licensing of the code does not imply endorsement by Task Venture Capital GmbH of any derivative works.