Preparers chunk text operation examples

These examples use preparers with the ChunkText operation in AI Accelerator.

Tip

This operation transforms the shape of the data, automatically unnesting collections by introducing a part_id column. See the unnesting concept for more detail.

Primitive

-- Only specify a desired length
SELECT * FROM aidb.chunk_text('This is a simple test sentence.', '{"desired_length": 10}');
Output
 part_id |   chunk
---------+-----------
       0 | This is a
       1 | simple
       2 | test
       3 | sentence.
(4 rows)
-- Specify a desired length and a maximum length
SELECT * FROM aidb.chunk_text('This is a simple test sentence.', '{"desired_length": 10, "max_length": 15}');
Output
 part_id |    chunk
---------+-------------
       0 | This is a
       1 | simple test
       2 | sentence.
(3 rows)
-- Named parameters
SELECT * FROM aidb.chunk_text(
    input => 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.',
    options => '{"desired_length": 40}'
);
Output
 part_id |                 chunk
---------+----------------------------------------
       0 | This is a significantly longer text
       1 | example that might require splitting
       2 | into smaller chunks.
       3 | The purpose of this function is to
       4 | partition text data into segments of a
       5 | specified maximum length, for example,
       6 | this sentence 145 is characters.
       7 | This enables processing or storage of
       8 | data in manageable parts.
(9 rows)

Preparer with table data source

-- Create source test table
CREATE TABLE source_table__1628
(
    id      INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    content TEXT NOT NULL
);
INSERT INTO source_table__1628
VALUES (1, 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.'),
       (2, 'This is sentence number one. This is sentence number one.');

SELECT aidb.create_table_preparer(
    name => 'preparer__1628',
    operation => 'ChunkText',
    source_table => 'source_table__1628',
    source_data_column => 'content',
    destination_table => 'chunked_data__1628',
    destination_data_column => 'chunks',
    source_key_column => 'id',
    destination_key_column => 'id',
    options => '{"desired_length": 120}'::JSONB  -- Configuration for the ChunkText operation
);

SELECT aidb.bulk_data_preparation('preparer__1628');

SELECT * FROM chunked_data__1628;
Output
 id | part_id | unique_id |                                                        chunks
----+---------+-----------+-----------------------------------------------------------------------------------------------------------------------
 1  |       0 | 1.part.0  | This is a significantly longer text example that might require splitting into smaller chunks.
 1  |       1 | 1.part.1  | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this
 1  |       2 | 1.part.2  | sentence 145 is characters. This enables processing or storage of data in manageable parts.
 2  |       0 | 2.part.0  | This is sentence number one. This is sentence number one.
(4 rows)

Could this page be better? Report a problem or suggest an addition!