Sunday, August 17, 2025

Best Practices for RDBMS to Vector DB conversion.

Converting a relational database (RDBMS) to a vector database is increasingly important for AI, search, and recommendation applications. Here are the key best practices for a successful transition.

1. Understand Your Data & Use Case

Data Analysis: Identify which tables/fields in your RDBMS contain the information to be vectorized—typically unstructured data like text, images, or user profiles.
Define Use Case: Are you enabling semantic search, recommendations, or LLM-powered chat? This influences your architecture and embedding strategy.

2. Generate Embeddings

Choose Embedding Model: Use appropriate models (e.g., OpenAI, Google's BERT, custom models) to convert selected data into high-dimensional vectors.
Schema & Data Vectorization: Decide if you’ll vectorize only the schema (structure, relationships) or both the schema and actual data. Schema embeddings help with query understanding; data embeddings support direct semantic search.

3. Select Your Migration Approach

Hybrid Approach (Recommended): Integrate vector capabilities within your existing RDBMS (e.g., using PostgreSQL’s pgvector extension) so you can store vectors alongside structured data, maintaining ACID compliance and minimizing infrastructure sprawl.
Full Migration: Move relevant data to a dedicated vector database (e.g., Pinecone, Qdrant, Milvus) for specialized workloads, especially at scale.

4. Data Transformation & Loading

Transform Data: Convert structured RDBMS records into vectors using your chosen model.
Batch/Stream Loading: Import embeddings into your target system, matching metadata (item IDs, text, user, etc.) for easy retrieval.
Schema Mapping: Consider data type conversions, so each RDBMS row has a corresponding vector and associated primary key (or other ID).

5. Indexing & Optimization

Build Efficient Indexes: Create vector indexes (e.g., HNSW, IVFFlat) to enable fast similarity search. In PostgreSQL with pgvector, use HNSW indexing for high performance.
Configure Query Tuning: Adjust database/vector engine parameters for filtering and accurate retrieval.

6. Maintain Relational Integrity

Metadata Preservation: Always keep relationships (foreign keys, constraints) intact or mirrored in metadata within the vector system.
Hybrid Queries: Many use cases require combining relational and vector queries—ensure your architecture supports this.

7. Integration With AI/ML Workflows

Seamless Integration: Make sure your new vector system connects easily with machine learning pipelines.
Real-Time Vectorization: Consider streaming new data through models as it enters the system, auto-updating embeddings.

Example: Migrating PostgreSQL to Vector Database with pgvector

Install the pgvector extension.

Create a table with a vector column:

CREATE TABLE embeddings (
  id bigserial primary key,
  content text,
  embedding vector(1536)
);

Insert generated embeddings alongside IDs and content fields.

Create a vector index for fast search:

CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops);

Key Considerations

Data Security & Compliance: Retain privacy and governance controls.
Scalability: Choose scalable solutions and optimize storage where possible.
Testing: Validate retrieval accuracy versus traditional SQL search.

Summary Table

Step	Description
Analyze Data	Select unstructured fields to vectorize
Generate Embeddings	Use ML models to create vectors
Migration Approach	Hybrid (pgvector) or full vector DB
Data Transformation	Map each record to embedding + ID
Indexing	Create HNSW/IVFFlat for search performance
Maintain Integrity	Keep metadata/relationships
Integrate AI/ML	Stream new data into a retrieval pipeline

Saturday, June 21, 2025

Unlock Java's Hidden Performance: JEP 508 Vector API in JDK 25

Introduction

We'll explore the Vector API introduced in JEP 508 for JDK 25. This API enables Java developers to write vectorized computations that compile to optimal vector instructions on supported CPUs, achieving performance gains of 8x-16x over traditional scalar operations.

The Vector API represents the tenth incubation of this feature, demonstrating Oracle's commitment to getting the API right before finalization. By the end of this tutorial, you'll understand how to use vector operations to significantly improve the performance of computation-heavy applications.

Why This Matters

The JVM tries to do auto-vectorization, but it’s not reliable. Small code changes break it. Performance is inconsistent.

With the Vector API, you can write predictable, portable, hardware-accelerated code across:

Intel (AVX/AVX2/AVX-512)
ARM (NEON/SVE)
RISC-V (emerging)

You get native-like performance with Java safety and type checks.

Understanding Scalars vs Vectors

Before diving into the Vector API, let's understand the fundamental difference between scalar and vector operations.

Scalar Operations

A scalar operation processes one value at a time:

In this example, each addition operation processes a single integer value.

Vector Operations

A vector operation processes multiple values simultaneously using SIMD (Single Instruction, Multiple Data):

With vector operations, we can process 8 integers simultaneously on AVX2 hardware, or 16 integers on AVX-512 hardware.

Vector API in Action: A Linear Algebra Example

Machine Learning is everywhere, and the matrix dot product? It's the secret sauce everyone's craving!

Compile & Run

Download JDK 25 EA

Compile

javac --enable-preview --release 25 --add-modules jdk.incubator.vector DotProductDemo.java

Run

java --enable-preview --add-modules jdk.incubator.vector DotProductDemo

Hardware Support Matters for the Vector API

If your CPU lacks SIMD capabilities (such as AVX2, NEON, or FMA), the JVM cannot emit vector instructions. Instead, the Vector API operations will silently fall back to scalar execution — operating on one element at a time.

What this means:

The Vector API will still run and produce correct results.
But it will offer no performance gain over a manually written scalar loop.
In fact, you might even see slightly worse performance, due to the overhead of API abstractions.

The Vector API is not just about writing fast code — it's about writing hardware-aware Java code. Without the right CPU, even the best vector code won’t outperform a for-loop.

When to Use Vector API

Good candidates:

Mathematical computations on large arrays
Image and signal processing
Matrix operations
Numerical simulations

Poor candidates:

Small arrays (overhead exceeds benefits)
Operations with complex control flow
Memory-bound algorithms (bandwidth limited)

Conclusion

JEP 508: Vector API represents a paradigm shift in Java performance optimization. By providing explicit control over SIMD operations, it enables developers to achieve near-native performance for computationally intensive tasks while maintaining Java's portability and safety.

The tenth incubation demonstrates Oracle's commitment to getting this API right before standardization. For developers working with performance-critical applications, the Vector API is not just a nice-to-have—it's becoming essential for competitive performance.

Whether you're building the next-generation ML framework, optimizing database queries, or creating real-time analytics systems, the Vector API provides the tools to unlock your CPU's full potential.

Ready to supercharge your Java applications? Start exploring the Vector API today!

Saturday, August 13, 2022

Virtual Thread (JEP-425)

Java Thread

First let's talk about the normal Java thread. Before we proceed further we have a new name for the normal Java thread and the name is Platform Thread. Now onward we will address normal Java thread as platform thread.

Platform thread is a wrapper of Operating System (OS) thread in Java. When we create a platform thread and run the thread using start() method we are actually making a native call to the OS and OS to assign a thread to platform thread for the execution. As we are getting the platform thread from the OS it is limited in numbers and it also means that whenever we ask for a platform thread it may not get assigned immediately. As the OS also needs to manage other tasks as well. The term ' expensive' is generally ascribed to anything that is limited in numbers and not immediately available. Thus, platform thread is undoubtedly expensive but it must be kept in mind that, since it executes on OS thread it is faster than virtual threads. Since it is expensive, we can not have as many as we want . So there is a bottle neck whenever we need a huge number of platform threads for execution. Technically speaking, there is no restriction from the OS side on the number of threads and with proper hardware infrastructure and OS configuration we can have a huge number of platform threads , but if we blindly spawn platform threads it will eat-up the computing resources for unnecessary tasks like context switching.

So why do we require a 'virtual thread'?

As we already know that Java platform thread is expensive and it runs on OS thread. It means whenever platform threads need to wait it blocks the precious OS thread. Consequently, we are not able to utilise the OS thread at optimum level and therefore, we are wasting it when platform threads wait for some tasks.

So, we require a thread which is good at waiting and at the same time can scale easily without eating-up the computer resources for unnecessary tasks like context switching. Virtual thread will not give us any significant or any advantage over platform thread.

As the virtual thread is good at waiting. So the ideal usage of the virtual thread will be when application is executing the IO operations like a database or other third party service calls, etc. During this IO operations virtual thread will wait for the IO operations to get complete without blocking the OS thread once the IO operation is done it will continue the rest of the task execution.

Probably, now we have an adequate idea about why we require a virtual thread.

Now what is a 'virtual thread'?

A virtual thread is thread which run on platform thread and it is of type java.lang.Thread.

It is made of two components - a continuation and a scheduler. Java already has an excellent scheduler in the form of ForkJoinPool, and added continuation to the JVM.

What is a 'continuation' now?

Now about continuation, it is in other words delimited continuation and also sometimes is called coroutine (continuation + routine).

Continuation is an abstract representation of control state in the computer program.

Routine is reusable pice of code that are usually called multiple times during the execution.

Key properties of coroutine

Can be suspended and resumed at anytime.
It is a data structure that represents the process state and call stack trace.
Can yield / give control to other coroutines.
It must have isDone() method which tell us the execution is done or not, yield() method to suspend the current continuations unto the given scope and run() method to mounts and runs the continuation body. If suspended, continues it from the last suspend point.

We will understand the continuation / coroutine using an example.

Output of the above code

In first iteration you can see it is executing till before line # 10 [Continuation.yield(scope);] then the below code parked in the heap for the future execution.

In second iteration you can see it is executing after line # 10 [Continuation.yield(scope);] it takes out the code from the heap which need to be executed.

How to create a virtual thread?

Terminologies we will be using in virtual thread

JDK assigning a virtual thread to platform thread is called MOUNTING.
JDK unassigned a virtual thread to platform thread is called UNMOUNTING.
The platform thread which is running the virtual thread is called its CARRIER THREAD.
Due to some reason both virtual and platform thread got blocked this is called PINNING of Operating System (OS) thread.

How does a 'virtual thread' work?

JDK creates a ForkJoinPool executor.
JDK then creates a continuation object.
Finally, on start() method invocation on virtual thread will schedule the execution.

Assign the virtual thread to a carrier thread.
Inherit the ExtendLocal bindings for the given carrier thread.
Submit the task to the scheduler.

Example to understand virtual thread scheduler

Output

Try to understand the code first. In this code we are simply creating two virtual thread object [code line # 9 to 29] and running that virtual thread in line # 31. To understand scheduler we are making each virtual thread to sleep for 1sec in line # 18.

Scheduler

VirtualThread[#21]/runnable@ForkJoinPool-1-worker-1 ==> Started the task this virtual thread

after came back from sleep

VirtualThread[#21]/runnable@ForkJoinPool-1-worker-2 ==> but the same task was completed by other virtual thread

Let's put scaling to the test for virtual thread

NOTE:- Just changed the line # 8 to generate platform thread.

Statistic of Platform Thread

Statistic of Virtual Thread

Lets analysis the statistic captured for the same code but different type of threads.

In case of virtual thread heap memory usage is more that platform thread. It is because when virtual thread is getting blocked it put its continuation object into heap memory.

[virtual thread heap memory usage > platform thread heap memory usage]

Only one extra class get loaded for virtual thread.

[# of classes loaded in virtual thread > # of classes loaded in platform thread]

More than three thousand platform threads got generated and processed whereas only twenty nine threads in case of virtual threads. As virtual threads are running on ForkJoinPool scheduler it does not required so many platform threads to get the execution completed.

[# of platform threads > # of virtual thread's manager]

Platform threads consumed CPU consistently than virtual threads.

Pinning

We learn about pinning in the above terminology section.

"Due to some reason both virtual and platform thread got blocked this is called PINNING of Operating System (OS) thread."

What are those reasons for pinning?

Reason for pinning

During native call, monitor held (wait, notify, notifyAll) and in critical section execution (synchronized block execution) virtual thread could block the platform thread.

It is recommend to use java.util.concurrent.locks.* package API instead of synchronized block.

Let's understand with an example. In this example for counter value zero's virtual thread is going to synchronized block and for counter value three's virtual thread is going to ReentrantLock lock.

Output

Pinning example

From the above output we could derived that in case of synchronized block same virtual thread (methodVirtualThread[#21]/runnable@ForkJoinPool-1-worker-1) is executing the task before and after the critical section. In this case continuation didn't worked as the virtual thread pinned the carrier thread.

In case of ReentrantLock lock before lock it was different thread (methodVirtualThread[#24]/runnable@ForkJoinPool-1-worker-2) which was executing the task and after unlock rest of the task was completed by the different virtual thread (methodVirtualThread[#24]/runnable@ForkJoinPool-1-worker-1). Continuation worked here as this virtual thread didn't blocked the carrier thread.

Some of the best practice to follow for virtual thread

Don't pool the virtual thread object.
Revisit synchronized code for better scalability.
Don't use TheadLocal rather user extend-local variables.

Github link to download the code.

Saturday, June 11, 2022

My journey as a Healthcare Software Professional

My journey as a Healthcare Software Professional

The moment I started recapitulating my professional journey, I came across many things, people and experiences that opened up different dimensions for self exploration. I started as a Java developer at Magna Infotech Private Limited to work on a Siemens Enterprise level Healthcare product. Later on, I was offered a direct position by Siemens itself to proceed with my work as a Software Engineer. I accepted that offer and spent four years of learning, exploring and self polishing. On 2nd February, 2015 Siemens' Healthcare product was sold to Cerner Corporation and gradually I became an employee of Cerner Corporation and continued my contribution in that area of work. Recently on 09th June, 2022, Oracle completed the formal acquisition of Cerner and consequently now I am working for new Healthcare future in Oracle Cerner.

When I look back my more than a decade old journey as a Healthcare software professional I feel amazed to see how long I have been into this journey with consistency, collaboration and productive communication to contribute in Healthcare domain. I would like to thank all my colleagues for their immense help and support in all way possible. Looking forward to continual support and amazing work experiences in coming years in Oracle Cerner.

#career #careerjourney #change #magnainfotech #siemens #cerner #oracle #oraclecerner #healthcaretech #healthcaretechnology #people #future #communication #collaboration

Wednesday, January 19, 2022

Using the headless mode in Spring Boot

What is headless mode in Java?

When an application does not require display, keyboard or mouse then system could be set as headless mode for the application. It could be done by setting the property System.setProperty("java.awt.headless", "true").

So why anyone would require headless mode / configuration?

Let's say you are creating an application which convert an image into ACSII characters. This work does not require a mouse, keyboard or monitor. As this application will pick-up image from a predefined location and convert that image into ACSII characters.

This is done to use the most of the computing power of the system. As image is converted in headless mode system then pass on to the headful system for the rendering of ASCII characters in display.

In case of Spring Boot image banner printing, it is done using "java.awt.headless" system properties as true to print the image banner. The property is reset once the image banner printing is done.

Saturday, January 8, 2022

Customise Spring Boot Banner

When we run a Spring Boot application we see a Spring Boot banner. I wonder if we could able to change this or not and find out that Spring gives us the opportunity to change it. Why anyone should do that? Answer will be just for fun and learning.

Now we want to change the banner by simply adding a new banner text file with my custom ASCII Art in it and point this banner file to Spring Boot properties. In spring.banner.location have the banner file location information.

spring.application.name=My Custom Banner
spring.banner.location=classpath:myBanner.txt

myBanner.txt

put it into class path

Now let's run the Spring Boot application and see the result.

Now we will see how to use an image (*.png/*.jpg/*.gif) as a banner.

To implement image we need to get an image of *.png/*.jpg/*.gif type and put it into the class path and set the path in the Spring Boot property spring.banner.image.location=classpath:myBanner.png here my image file name is myBanner.png.

spring.main.banner-mode where we mention the banner mode console | log | off and the default value is console. console will print the banner in console and log in log file and off will turn of the banner.
spring.banner.image.location where we put the banner image file location and the default value is classpath:banner.gif.
spring.banner.image.height where we put an integer to specify the height of the image and the default value will be based on the image height.
spring.banner.image.width where we put an integer to specify the width of the image in chars and the default value is 76.
spring.banner.image.margin where we put an integer to specify the margin from the left hand image margin in chars and the default value is 2.
spring.banner.image.bitdepth is used for ANSI colour and where we use two allowed values 4 (16 colour) or 8 (256 colour).
spring.banner.image.invert is set to true then the dark terminal themes will be used for and the default value is false.
spring.banner.image.pixelmode is to determine the pixel used to render the image, where we can use TEXT | BLOCK and the default value is TEXT. TEXT will use ' ', '.', '*', ':', 'o', '&', '8', '#', '@' to render image and BLOCK will use ' ', '░', '▒', '▓', '█' to render image.

Now see a sample

spring.application.name=My Custom Banner
spring.main.banner-mode=console
spring.banner.location=classpath:myBanner.txt
spring.banner.image.pixelmode=text
spring.banner.image.location=classpath:myBanner.png
spring.banner.image.height=10
spring.banner.image.width=30
spring.banner.image.margin=2
spring.banner.image.bitdepth=8
spring.banner.image.invert=false

I will be using this 32 bit colour myBanner.png

myBanner.png

Output in console

Banner as a TEXT

Now lets see what will happen if we put the dark terminal themes by using spring.banner.image.invert=true

Inverted image render

Let's render the image in BLOCK by using spring.banner.image.pixelmode=block

Banner as a BLOCK

Now put the banner in the log file by using spring.main.banner-mode=log

Banner in log file

This help us to understand how plug-n-play is Spring Boot.

Notice