How to Use Caffeine with Kotlin Coroutines - Introduction to caffeine-coroutines - Related to caffeine-coroutines, -, caffeine, envoy, coroutines
Leverage open models like Gemma 2 on GKE with LangChain

In my previous posts, we explored how LangChain simplifies AI application development and how to deploy Gemini-powered LangChain applications on GKE. Now, let's take a look at a slightly different approach: running your own instance of Gemma, Google's open large language model, directly within your GKE cluster and integrating it with LangChain.
While using an LLM endpoint like Gemini is convenient, running an open model like Gemma 2 on your GKE cluster can offer several advantages:
Control: You have complete control over the model, its resources, and its scaling. This is particularly essential for applications with strict performance or security requirements.
You have complete control over the model, its resources, and its scaling. This is particularly critical for applications with strict performance or security requirements. Customization: You can fine-tune the model on your own datasets to optimize it for specific tasks or domains.
You can fine-tune the model on your own datasets to optimize it for specific tasks or domains. Cost optimization: For high-volume usage, running your own instance can potentially be more cost-effective than using the API.
For high-volume usage, running your own instance can potentially be more cost-effective than using the API. Data locality: Keep your data and model within your controlled environment, which can be crucial for compliance and privacy.
Keep your data and model within your controlled environment, which can be crucial for compliance and privacy. Experimentation: You can experiment with the latest research and techniques without being limited by the API's attributes.
Deploying Gemma on GKE involves several steps, from setting up your GKE cluster to configuring LangChain to use your Gemma instance as its LLM.
To be able to use the Gemma 2 model, you first need a Hugging Face account. Start by creating one if you don't already have one, and create a token key with read permissions from your settings page. Make sure to note down the token value, which we'll need in a bit.
If you don't already have a GKE cluster, you can create one through the Google Cloud Console or using the gcloud command-line tool. Make sure to choose a machine type with sufficient resources to run Gemma, such as the g2-standard family which includes an attached NVIDIA L4 GPU. To simplify this, we can simply create a GKE Autopilot cluster.
gcloud container clusters create-auto langchain-cluster \ --project = PROJECT_ID \ --region = us-central1 Enter fullscreen mode Exit fullscreen mode.
For this example we'll be deploying an instruction-tuned instance of Gemma 2 using a vLLM image. The following manifest describes a deployment and corresponding service for the gemma-2-2b-it model. Replace HUGGINGFACE_TOKEN with the token you generated earlier.
apiVersion : apps/v1 kind : Deployment metadata : name : gemma-deployment spec : replicas : 1 selector : matchLabels : app : gemma-server template : metadata : labels : app : gemma-server [website] : gemma-2-2b-it [website] : vllm [website] : model-garden spec : containers : - name : inference-server image : [website] resources : requests : cpu : 2 memory : 34Gi ephemeral-storage : 10Gi [website] : 1 limits : cpu : 2 memory : 34Gi ephemeral-storage : 10Gi [website] : 1 args : - python - -m - vllm.entrypoints.api_server - [website] - --port=8000 - --model=google/gemma-2-2b-it - --tensor-parallel-size=1 - --swap-space=16 - [website] - --enable-chunked-prefill - --disable-log-stats env : - name : MODEL_ID value : google/gemma-2-2b-it - name : DEPLOY_SOURCE value : " UI_NATIVE_MODEL" - name : HUGGING_FACE_HUB_TOKEN valueFrom : secretKeyRef : name : hf-secret key : hf_api_token volumeMounts : - mountPath : /dev/shm name : dshm volumes : - name : dshm emptyDir : medium : Memory nodeSelector : [website] : nvidia-l4 --- apiVersion : v1 kind : Service metadata : name : llm-service spec : selector : app : gemma-server type : ClusterIP ports : - protocol : TCP port : 8000 targetPort : 8000 --- apiVersion : v1 kind : Secret metadata : name : hf-secret type : Opaque stringData : hf_api_token : HUGGINGFACE_TOKEN Enter fullscreen mode Exit fullscreen mode.
Save this to a file called [website] , then deploy it to your cluster:
kubectl apply -f [website] Enter fullscreen mode Exit fullscreen mode.
Now that we have our GKE cluster and Gemma deployed, we need to create our LangChain application and deploy it. If you've followed my previous post, you'll notice that these steps are very similar. The main differences are that we're pointing LangChain to Gemma instead of Gemini, and that our LangChain application uses a custom LLM class to ingest our local instance of Gemma.
First, we need to package our LangChain application into a Docker container. This involves creating a Dockerfile that specifies the environment and dependencies for our application. Here is a Python application using LangChain and Gemma, which we'll save as [website] :
from langchain_core.callbacks.manager import CallbackManagerForLLMRun from [website] import LLM from langchain_core.prompts import ChatPromptTemplate from typing import Any , Optional from flask import Flask , request import requests class VLLMServerLLM ( LLM ): vllm_url : str model : Optional [ str ] = None temperature : float = [website] max_tokens : int = 2048 @property def _llm_type ( self ) -> str : return " vllm_server " def _call ( self , prompt : str , run_manager : Optional [ CallbackManagerForLLMRun ] = None , ** kwargs : Any , ) -> str : headers = { " Content-Type " : " application/json " } payload = { " prompt " : prompt , " temperature " : self . temperature , " max_tokens " : self . max_tokens , ** kwargs } if self . model : payload [ " model " ] = self . model try : response = requests . post ( self . vllm_url , headers = headers , json = payload , timeout = 120 ) response . raise_for_status () json_response = response . json () if isinstance ( json_response , dict ) and " predictions " in json_response : text = json_response [ " predictions " ][ 0 ] else : raise ValueError ( f " Unexpected response format from vLLM server: { json_response } " ) return text except requests . exceptions . RequestException as e : raise ValueError ( f " Error communicating with vLLM server: { e } " ) except ( KeyError , TypeError ) as e : raise ValueError ( f " Error parsing vLLM server response: { e } . Response was: { json_response } " ) llm = VLLMServerLLM ( vllm_url = " [website]:8000/generate " , temperature = [website] , max_tokens = 512 ) prompt = ChatPromptTemplate . from_messages ( [ ( " system " , " You are a helpful assistant that answers questions about a given topic. " , ), ( " human " , " {input} " ), ] ) chain = prompt | llm def create_app (): app = Flask ( __name__ ) [website] ( " /ask " , methods = [ ' POST ' ]) def talkToGemini (): user_input = request . json [ ' input ' ] response = chain . invoke ({ " input " : user_input }) return response return app if __name__ == " __main__ " : app = create_app () app . run ( host = ' [website] ' , port = 80 ) Enter fullscreen mode Exit fullscreen mode.
Then, create a Dockerfile to define how to assemble our image:
# Use an official Python runtime as a parent image FROM python:3-slim # Set the working directory in the container WORKDIR /app # Copy the current directory contents into the container at /app COPY . /app # Install any needed packages specified in [website] RUN pip install -r [website] # Make port 80 available to the world outside this container EXPOSE 80 # Run [website] when the container launches CMD [ "python", "[website]" ] Enter fullscreen mode Exit fullscreen mode.
For our dependencies, create the [website] file containing LangChain and a web framework, Flask:
langchain flask Enter fullscreen mode Exit fullscreen mode.
Finally, build the container image and push it to Artifact Registry. Don't forget to replace PROJECT_ID with your Google Cloud project ID.
# Authenticate with Google Cloud gcloud auth login # Create the repository gcloud artifacts repositories create images \ --repository-format = docker \ --location = us # Configure authentication to the desired repository gcloud auth configure-docker [website] # Build the image docker build -t [website] . # Push the image docker push [website] Enter fullscreen mode Exit fullscreen mode.
After a handful of seconds, your container image should now be stored in your Artifact Registry repository.
Create a YAML file with your Kubernetes deployment and service manifests. Let's call it [website] , replacing PROJECT_ID .
apiVersion : apps/v1 kind : Deployment metadata : name : langchain-deployment spec : replicas : 3 # Scale as needed selector : # Add selector here matchLabels : app : langchain-app template : metadata : labels : app : langchain-app spec : containers : - name : langchain-container image : [website] ports : - containerPort : 80 --- apiVersion : v1 kind : Service metadata : name : langchain-service spec : selector : app : langchain-app ports : - protocol : TCP port : 80 targetPort : 80 type : LoadBalancer # Exposes the service externally Enter fullscreen mode Exit fullscreen mode.
# Get the context of your cluster gcloud container clusters get-credentials langchain-cluster --region us-central1 # Deploy the manifest kubectl apply -f [website] Enter fullscreen mode Exit fullscreen mode.
This creates a deployment with three replicas of your LangChain application and exposes it externally through a load balancer. You can adjust the number of replicas based on your expected load.
Once the service is deployed, you can get the external IP address of your application using:
export EXTERNAL_IP = ` kubectl get service/langchain-service \ --output jsonpath = '{.status.loadBalancer.ingress[0].ip}' ` Enter fullscreen mode Exit fullscreen mode.
You can now send requests to your LangChain application running on GKE. For example:
curl -X POST -H "Content-Type: application/json" \ -d '{"input": "Tell me a fun fact about hummingbirds"}' \ http:// $EXTERNAL_IP /ask Enter fullscreen mode Exit fullscreen mode.
Scaling: You can scale your Gemma deployment independently of your LangChain application based on the load generated by the model.
You can scale your Gemma deployment independently of your LangChain application based on the load generated by the model. Monitoring: Use Cloud Monitoring and Cloud Logging to track the performance of both Gemma and your LangChain application. Look for error rates, latency, and resource utilization.
Use Cloud Monitoring and Cloud Logging to track the performance of both Gemma and your LangChain application. Look for error rates, latency, and resource utilization. Fine-tuning: Consider fine-tuning Gemma on your own dataset to improve its performance on your specific use case.
Consider fine-tuning Gemma on your own dataset to improve its performance on your specific use case. Security: Implement appropriate security measures, such as network policies and authentication, to protect your Gemma instance.
Deploying Gemma on GKE and integrating it with LangChain provides a powerful and flexible way to build AI-powered applications. You gain fine-grained control over your model and infrastructure while still leveraging the developer-friendly attributes of LangChain. This approach allows you to tailor your setup to your specific needs, whether it's optimizing for performance, cost, or control.
Check out the LangChain documentation for advanced use cases and integrations.
Dive deeper into GKE documentation for running production workloads.
In the next post, we will take a look at how to streamline LangChain deployments using LangServe.
In 2025, forward-thinking engineering teams are reshaping their approach to work, combining emerging technologies with new approaches to collaboration......
Notion had for a long-time a neat block-based editor. It allows you to type away with a parag......
Run envoy proxy with docker

Hey friends, today we will do a short introduction how to run envoy proxy with docker. We will work with the basic blocks of envoy, which are listeners, clusters and filters.
docker run --name=proxy-with-admin -d -p 9901:9901 -p 10000:10000 envoyproxy/[website].
This exposes the 9901 container image to your host. This way you can access the admin portal in [website]:9901/.
From docker desktop, you can see in [website] which contains the configuration of your envoy, that admin portal is exposed on 9901 port on your container.
Envoy uses listeners to accept client requests. These requests are then forwarded to clusters, which represent your backend services.
In the [website], we see the following configuration for listeners:
listeners: - name: listener_0 address: socket_address: protocol: TCP address: [website] port_value: 10000 Enter fullscreen mode Exit fullscreen mode.
This means that your listener is on port 10000. We have exposed this port with the Docker command: -p 10000:10000.
Therefore when you hit [website]:10000/ on your local machine, you reach the Envoy listener. When an Envoy listener receives a request, the Envoy filters execute.
In the filters section of our [website] we see:
filter_chains: - filters: - name: envoy.filters.network.http_connection_manager typed_config: "@type": [website] scheme_header_transformation: scheme_to_overwrite: https stat_prefix: ingress_http route_config: name: local_route virtual_hosts: - name: local_service domains: ["*"] routes: - match: prefix: "/" route: host_rewrite_literal: [website] cluster: service_envoyproxy_io http_filters: - name: [website] typed_config: "@type": [website] Enter fullscreen mode Exit fullscreen mode.
we see the type of filter we have applied, which is a HTTP connection manager more here.
Your cluster (aka your backend) is service_envoyproxy_io.
If you want to use your own [website] configuration, you need to have an [website] file and provide it in the container image you are running.
Run this command in the folder where you have your [website].
docker run --name=proxy-with-admin -d -p 9901:9901 -p 10000:10000 -v [website] envoyproxy/[website].
The command -v [website] is used to mount a file or directory from your local machine into a container. In this case, it mounts the local file [website] to the path /etc/envoy/[website] inside the container. This ensures that the container uses the configuration file from your local environment.
OpenAI has launched Deep Research, a new agent within ChatGPT designed to conduct in-depth, multi-step investigations across the web. Initially availa......
We call this an event when a button is pressed; a sensor detects a temperature change, or a transaction flows through. An event is an action or state ......
What Is Data Governance, and How Do Data Quality, Policies, and Procedures Strengthen It?
Data governance refers to the overall management of data av......
How to Use Caffeine with Kotlin Coroutines - Introduction to caffeine-coroutines

JVM has a widely used library for implementing caching called Caffeine.
Since this library is written in Java, it cannot be used directly on Coroutines.
Caffeine provides Async support (an interface compatible with CompletableFuture ), which makes it possible to use it on Coroutines with some effort. However, unless you have a deep understanding of both Caffeine and Coroutines, the learning curve can be quite steep.
In fact, there are questions on Stack Overflow regarding its usage with Coroutines, indicating that some people struggle with it.
To make it easy for everyone to use, I created a library called caffeine-coroutines.
※ It has also been mentioned in the Caffeine README.
First things first, let's add it to the dependencies.
implementation ( "dev.hsbrysk:caffeine-coroutines:{{version}}" ) Enter fullscreen mode Exit fullscreen mode.
As for usage, there are almost no differences from the original Caffeine. The only new thing to learn is that by calling buildCoroutine , you can obtain a CoroutineCache instance that works on Coroutines.
suspend fun main () { val cache : CoroutineCache < String , String > = Caffeine . newBuilder () . maximumSize ( 10_000 ) . expireAfterWrite ( Duration . ofMinutes ( 5 )) . buildCoroutine () // buildCoroutineを使う val value = cache . get ( "key" ) { delay ( 1000 ) // suspend functionが使えるようになる "value" } println ( value ) } Enter fullscreen mode Exit fullscreen mode.
Of course, the loading cache style is also supported. By passing a loader to buildCoroutine , you can obtain a CoroutineLoadingCache instance.
suspend fun main () { val cache : CoroutineLoadingCache < String , String > = Caffeine . newBuilder () . maximumSize ( 10_000 ) . expireAfterWrite ( Duration . ofMinutes ( 5 )) . buildCoroutine { // buildCoroutineを使う delay ( 1000 ) // suspend functionが使えるようになる "value" } val value = cache . get ( "key" ) println ( value ) } Enter fullscreen mode Exit fullscreen mode.
Caffeine provides Cache and LoadingCache for blocking use cases, and AsyncCache and AsyncLoadingCache for async ( CompletableFuture ) use cases.
As the name points to, this library focuses solely on providing CoroutineCache and CoroutineLoadingCache for Coroutines use cases.
The introduction of custom APIs/interfaces is kept to a minimum.
Introducing unique APIs/interfaces can confuse individuals, making adoption difficult and causing trouble when discontinuing usage.
Comparison with Aedile (Similar Implementations).
There is actually another Kotlin wrapper for Caffeine called Aedile.
However, when I implemented caffeine-coroutines, I encountered the following issues with Aedile:
(Many of these issues have been resolved in the not long ago released version 2, so using Aedile is also an option.).
Issues with Coroutine Scope Handling Ideally, it should execute in a scope derived from the caller's scope (otherwise, for example, the Coroutine Context will not be inherited), but for some reason, it used a separate scope. As a result, there were issues with inheriting things like MDC.
Issues with Coroutine Cancellation Although the scope handling issue mentioned above was addressed in later versions, there was an issue where an exception occurring inside the loader would cancel the calling Coroutine. This was fixed in version 2, but the fix seems a bit odd… (simply using the coroutineScope function to create a child scope should suffice).
Too Many Custom APIs To create a cache, you had to use a custom builder called caffeineBuilder . It’s easier to understand if you can use the official builder as is, and using the custom builder meant that some aspects available in the original implementation were unavailable. This has been resolved in version 2 by adopting a style similar to caffeine-coroutines. Additionally, metrics required using a custom API called CacheMetrics , which I also found problematic. This has also been deprecated in version 2.
caffeine-coroutines is a library that makes it easy to use Caffeine on Coroutines. By using the buildCoroutine extension function, you can utilize Caffeine’s functions seamlessly in a Coroutines environment.
Although Aedile has improved, it previously had issues such as scope handling, cancellation behavior, and excessive custom APIs. caffeine-coroutines focuses on a simple design that leverages official APIs.
If you want to use Caffeine in a Coroutines environment, give it a try!
What Is Data Governance, and How Do Data Quality, Policies, and Procedures Strengthen It?
Data governance refers to the overall management of data av......
Understanding Teradata Data Distribution and Performance Optimization.
Teradata performance optimization and database tuning are crucial for modern en......
Over the last couple of years, the tech industry has accelerated efforts to consolidate tooling and increase automation, in an effort to lighten the c......
Market Impact Analysis
Market Growth Trend
2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 |
---|---|---|---|---|---|---|
7.5% | 9.0% | 9.4% | 10.5% | 11.0% | 11.4% | 11.5% |
Quarterly Growth Rate
Q1 2024 | Q2 2024 | Q3 2024 | Q4 2024 |
---|---|---|---|
10.8% | 11.1% | 11.3% | 11.5% |
Market Segments and Growth Drivers
Segment | Market Share | Growth Rate |
---|---|---|
Enterprise Software | 38% | 10.8% |
Cloud Services | 31% | 17.5% |
Developer Tools | 14% | 9.3% |
Security Software | 12% | 13.2% |
Other Software | 5% | 7.5% |
Technology Maturity Curve
Different technologies within the ecosystem are at varying stages of maturity:
Competitive Landscape Analysis
Company | Market Share |
---|---|
Microsoft | 22.6% |
Oracle | 14.8% |
SAP | 12.5% |
Salesforce | 9.7% |
Adobe | 8.3% |
Future Outlook and Predictions
The Caffeine Coroutines Leverage landscape is evolving rapidly, driven by technological advancements, changing threat vectors, and shifting business requirements. Based on current trends and expert analyses, we can anticipate several significant developments across different time horizons:
Year-by-Year Technology Evolution
Based on current trajectory and expert analyses, we can project the following development timeline:
Technology Maturity Curve
Different technologies within the ecosystem are at varying stages of maturity, influencing adoption timelines and investment priorities:
Innovation Trigger
- Generative AI for specialized domains
- Blockchain for supply chain verification
Peak of Inflated Expectations
- Digital twins for business processes
- Quantum-resistant cryptography
Trough of Disillusionment
- Consumer AR/VR applications
- General-purpose blockchain
Slope of Enlightenment
- AI-driven analytics
- Edge computing
Plateau of Productivity
- Cloud infrastructure
- Mobile applications
Technology Evolution Timeline
- Technology adoption accelerating across industries
- digital transformation initiatives becoming mainstream
- Significant transformation of business processes through advanced technologies
- new digital business models emerging
- Fundamental shifts in how technology integrates with business and society
- emergence of new technology paradigms
Expert Perspectives
Leading experts in the software dev sector provide diverse perspectives on how the landscape will evolve over the coming years:
"Technology transformation will continue to accelerate, creating both challenges and opportunities."
— Industry Expert
"Organizations must balance innovation with practical implementation to achieve meaningful results."
— Technology Analyst
"The most successful adopters will focus on business outcomes rather than technology for its own sake."
— Research Director
Areas of Expert Consensus
- Acceleration of Innovation: The pace of technological evolution will continue to increase
- Practical Integration: Focus will shift from proof-of-concept to operational deployment
- Human-Technology Partnership: Most effective implementations will optimize human-machine collaboration
- Regulatory Influence: Regulatory frameworks will increasingly shape technology development
Short-Term Outlook (1-2 Years)
In the immediate future, organizations will focus on implementing and optimizing currently available technologies to address pressing software dev challenges:
- Technology adoption accelerating across industries
- digital transformation initiatives becoming mainstream
These developments will be characterized by incremental improvements to existing frameworks rather than revolutionary changes, with emphasis on practical deployment and measurable outcomes.
Mid-Term Outlook (3-5 Years)
As technologies mature and organizations adapt, more substantial transformations will emerge in how security is approached and implemented:
- Significant transformation of business processes through advanced technologies
- new digital business models emerging
This period will see significant changes in security architecture and operational models, with increasing automation and integration between previously siloed security functions. Organizations will shift from reactive to proactive security postures.
Long-Term Outlook (5+ Years)
Looking further ahead, more fundamental shifts will reshape how cybersecurity is conceptualized and implemented across digital ecosystems:
- Fundamental shifts in how technology integrates with business and society
- emergence of new technology paradigms
These long-term developments will likely require significant technical breakthroughs, new regulatory frameworks, and evolution in how organizations approach security as a fundamental business function rather than a technical discipline.
Key Risk Factors and Uncertainties
Several critical factors could significantly impact the trajectory of software dev evolution:
Organizations should monitor these factors closely and develop contingency strategies to mitigate potential negative impacts on technology implementation timelines.
Alternative Future Scenarios
The evolution of technology can follow different paths depending on various factors including regulatory developments, investment trends, technological breakthroughs, and market adoption. We analyze three potential scenarios:
Optimistic Scenario
Rapid adoption of advanced technologies with significant business impact
Key Drivers: Supportive regulatory environment, significant research breakthroughs, strong market incentives, and rapid user adoption.
Probability: 25-30%
Base Case Scenario
Measured implementation with incremental improvements
Key Drivers: Balanced regulatory approach, steady technological progress, and selective implementation based on clear ROI.
Probability: 50-60%
Conservative Scenario
Technical and organizational barriers limiting effective adoption
Key Drivers: Restrictive regulations, technical limitations, implementation challenges, and risk-averse organizational cultures.
Probability: 15-20%
Scenario Comparison Matrix
Factor | Optimistic | Base Case | Conservative |
---|---|---|---|
Implementation Timeline | Accelerated | Steady | Delayed |
Market Adoption | Widespread | Selective | Limited |
Technology Evolution | Rapid | Progressive | Incremental |
Regulatory Environment | Supportive | Balanced | Restrictive |
Business Impact | Transformative | Significant | Modest |
Transformational Impact
Technology becoming increasingly embedded in all aspects of business operations. This evolution will necessitate significant changes in organizational structures, talent development, and strategic planning processes.
The convergence of multiple technological trends—including artificial intelligence, quantum computing, and ubiquitous connectivity—will create both unprecedented security challenges and innovative defensive capabilities.
Implementation Challenges
Technical complexity and organizational readiness remain key challenges. Organizations will need to develop comprehensive change management strategies to successfully navigate these transitions.
Regulatory uncertainty, particularly around emerging technologies like AI in security applications, will require flexible security architectures that can adapt to evolving compliance requirements.
Key Innovations to Watch
Artificial intelligence, distributed systems, and automation technologies leading innovation. Organizations should monitor these developments closely to maintain competitive advantages and effective security postures.
Strategic investments in research partnerships, technology pilots, and talent development will position forward-thinking organizations to leverage these innovations early in their development cycle.
Technical Glossary
Key technical terms and definitions to help understand the technologies discussed in this article.
Understanding the following technical concepts is essential for grasping the full implications of the security threats and defensive measures discussed in this article. These definitions provide context for both technical and non-technical readers.