Home Page ContentPress Releases AI Bonanza, RedHat OpenShift, Connecting and Migrating from AWS, Equinix and Mainframes plus new features: The August 2023 MinIO Newsletter  

AI Bonanza, RedHat OpenShift, Connecting and Migrating from AWS, Equinix and Mainframes plus new features: The August 2023 MinIO Newsletter  

by Anthony Weaver

ALL THINGS AI:

At the risk of sounding snooty – we were on this AI thing years ago and
we have the Wayback Machine to prove it. [2] Kubeflow ships with MinIO
and every major AI framework has documentation on how to work with
MinIO. We are still pedaling hard though and this month has no fewer
than five AI posts. Here is how they break down.

A recurrent theme in AI is efficiency and optimization. This is born of
GPU scarcity and is getting solved with software. MinIO teamed up with
the folks at cnvrg.io to improve the performance and efficiency of large
language models (LLMs) by using Retrieval Augmented Generation (RAG).
RAG is a technique that uses a retrieval model to retrieve relevant
documents from a knowledge base that a LLM can use to augment the
generated text. This can help to improve the accuracy and fluency of the
generated text. The post contains an overview showing how cnvrg.io, RAG
and MinIO enhance LLMs [3].

Building on the efficiency theme, new author, Sidharth Rajaram
demonstrates how to make model serving with Pytorch more efficient by
pairing it with MinIO [4]. Using MinIO as the single source of truth has
distinct advantages and together with Pytorch’s Model Archive Files
(MAR files) can be streamlined such that they reduce model turnaround
time using PyTorch Serve. This one crushed it over on HackerNews and is
well worth your time.

Did we mention the importance of efficiency? Efficient data management
and version control are key to successful ML workflows. One vector to
achieve this is through the power of parallel ML – running
experimentation in parallel with different parameters (for example,
using different optimizers, or using a different number of epochs).
MinIO teams with lakeFS to supercharge your ML experiments and
streamline your development pipeline without compromising on performance
or scalability. [5]

AI/ML SME Keith Pijanowski [6] continues to crank out quality content.
He goes deep on setting up a development machine with MLFlow and MinIO
[7] in this post. MLflow is an open-source platform designed to manage
the complete machine learning lifecycle. It supports Tracking, Projects,
Models and Repositories. This easy-to-follow recipe for setting up
MLflow and MinIO on a development machine will save you time and effort
researching the MLflow servers and Docker compose configurations. You
are welcome.

Keith also authored a superb piece on optimizing your AI in the throws
of a gold rush. Originally over on The New Stack, it is now on our site.
It has lots of practical advice from someone who has been there – from
starting with a simple model, considering the end-to-end workflow, using
APIs from public LLMs, fine-tuning of public LLMs and more. A great read
for the practitioner or interested developer.

The Architect’s Guide to Storage for AI [8]: Data is the fuel for the AI
revolution and that data needs to reside somewhere. That somewhere is,
more often than not, object storage. Why is that? Well, scalability,
durability and throughput are key reasons. So too are architectural
considerations like a flat namespace, a simple API, the ability to
handle unstructured data and immutability. This post, which originally
appeared in The New Stack has become required reading in the field.

WHY FILE ON OBJECT IS A BAD IDEA:

This post [9] generated a fair amount of controversy – lighting up the
debate channels on Reddit and HackerNews. It returns for a second month
as our top-clicked post. Sift through the noise, however, and you find
that developers and engineers know the truth – putting a file system on
top of an object store doesn’t work. Learn why in this excellent
example of S3fuse on top of MinIO by Dileeshvar Radhakrishnan [10].

OPENSHIFT DOUBLE:

MinIO sees the “ocp” snippet is a lot of console URLs. It is
powerful, open and fully featured. We have two posts this month – both
from AJ, on working with OpenShift. As a primer, start with MinIO and
OpenShift on your laptop [11] post.

Go deeper to discuss the steps that you need to take when you want to
set up your cluster (an actual cluster, not just a test machine) for
Proof of Concept, [12] or any other reason, so you have something more
robust to experiment with.

RETHINKING BACKUP AND RESTORE:

Scale defines the modern data-driven enterprise. What used to cut it on
the scale front no longer does. TB’s give way to PB’s which give way
to EB’s. This changes the math on backup and restore. There exists a
line, let’s call it a PB, where you need to start to think
differently. Ugur Tigli returns with some outstanding advice on what to
think about when and how to execute [13].

JUMBO FOR HIGH-PERFORMANCE DATABASE BACKUPS:

Building on the message above, it turns out that modern databases have
grown too big for SAN/NAS architectures. The inability of SAN/NAS to
scale creates significant issues for the enterprise.

As a result, at the behest of one of our Fortune 100 customers, MinIO
built a tool we call Jumbo. Jumbo can handle massive dumps of data by
creating parallel streams to upload segments of large objects. That
collection of objects can be read back with a single restore command. In
essence, Jumbo organizes a large file, such as a database snapshot, into
a single large stream of objects and uploads them rapidly in parallel to
object storage using the S3 API. The only limitation to Jumbo is network
speed (more on this below).

If you do this in your workplace you have to read this post. [14] Fast
and scalable are things you want from your software and it is what Jumbo
delivers.

PERFORMANCE TESTING MINIO:

MinIO’s speed is legendary. Our benchmarks remain unchallenged, often
for years. Because of this, there are expectations around MinIO’s
performance and when they are not met, the question is always – why? To
answer these questions we built additional tools – with the goal of
assessing HW performance (network and drive). The first is the Dperf
[15]drive performance measurement tool that anyone can use to identify
problem drives. The second is FIO [16] which (flexible IO tester) is
used to simulate a given I/O workload. The third is HPERF [17] a tool
that measures the maximum achievable bandwidth between a specified
number of peers, reporting receive and transmit bandwidth for each peer.

Matt Sarrel [18]does a superb job of explaining what to use when and why
[19].

YOUTUBE:

 Last month in the MinIO Operator series we posted Kubernetes modules
one [20] and two [21] which covered an overview of Kubernetes and how to
set up a lab environment. This month we wrapped up the series with our
third and final module—MinIO Operator Kubernetes with Manifest
Files—which uses the lab environment built in module two to run a
series of labs on the MinIO Operator.

The final module is split into four videos. The introductory video [22]
lays out the goals for the module and discusses temporary and long-term
access to the MinIO S3 API and Console. Modifying and Deploying [23]
discusses the numerous different Kubernetes options for MinIO (EKS, GKE,
AKS, etc.) as well as deploying and modifying manifest files to use the
MinIO Kubernetes Operator. K3s Single Node Lab [24]walks through how to
install MinIO on Kubernetes with manifest files, focusing specifically
on K3s. And, finally, K3d Multi-Node Lab [25] takes you through how to
install MinIO on Kubernetes with manifest files, specifically the K3d
install.

Want to get to know the MinIO team better? Our MinIO Meet the Engineers
video series is live. Learn about the heart and soul of MinIO with
insights directly from our engineering team. Meet Poorna Krishnamoorthy
[26], Lead Engineer for Active-Active-Replication and Jill Inapurapu
[27], Lead UI/UX Developer for the MinIO Console and SUBNET.

Subscribe to our YouTube channel [28] to stay up to date with our latest
videos.

THE GREAT MIGRATION + GETTING AT MAINFRAME DATA:

Despite what the big three will tell you, repatriation is a thing. The
CFO is in the conversation now and the CIO better get used to it. There
is a good reason too – MinIO has a publicly traded customer who migrated
out of a major cloud onto MinIO and improved the gross margin of the
ENTIRE business by more than 4%. That moves stock prices.

Because of the interest we have in this area, we have two core posts and
one more utilitarian one to share this month. The first deals with the
best practices in moving from AWS to MinIO. Matt Sarrel covers it all –
from estimating costs to what you need to do [29]. The second post deals
with moving from AWS to MinIO on Equinix Metal [30]. In this case, AJ
goes deep on how to do it and what to expect.

We don’t want to discriminate against legacy data though and so we
show you how to connect with mainframes and Windows machines. For big
banks and retailers, where billions over billions of transactions go on
every day, not everything is stored on the latest systems using modern
protocols. Many times, and for a variety of reasons, they are required
to keep using these legacy systems. We have invested in making legacy
data sharing easier, providing mechanisms and integrations to connect
the legacy system with MinIO and leverage the data for other modern
applications. Go deep [31].

NEW AND NOTABLE – RELEASE NOTES FROM JULY 2023

ICYMI, this is a new feature we will have each month to ensure you know
about all the goodness that comes in our weekly release cycles. We want
to reiterate our strong recommendation to upgrade to the latest release.
Remember that MinIO has issued an alert on the Information Disclosure in
Cluster Deployment vulnerability. This vulnerability is being actively
exploited according to the U.S. Cybersecurity and Infrastructure
Security Agency (CISA). They have added it to its Known Exploited
Vulnerabilities (KEV) catalog.

We made about 75 features and bug fixes across 4 releases in the month
of July. Some of the notable ones are as follows: We have made several
updates to the metrics to provide better visibility by adding more
metrics [32] to the cluster endpoint with an updated dashboard [33]. In
addition we also added a brand new endpoint called bucket [34] with a
dashboard coming in the near future and also active disk health [35]
checks. We added a linter [36] for our helm chart so that various code
syntax and formatting rules are followed when running CI/CD pipelines.
Last but not least we’ve upgraded the console to v0.31.0 [37]. Please
visit our releases [38] page for more information.

Content distributors have a legal copyright requirement to ensure that
every unit of recording be mapped to an object on the storage system and
each unit needs to be copied a predetermined number of times. These
units get generated generally when you rewind live TV a few seconds to
several minutes. In order to address this MinIO built the fan-out
feature [39], which we’ve covered in the API [40], where instead of
single there are multiple objects written defined by a list of
PutObjectFanOutRequest.

BITS AND BYTES:

July exploded with bits and bites:

We published a nice little case study with the folks from UCE Systems.
The client had to solve a Hadoop migration problem [41] – but with
legacy HW. Together, we made it work. We talk more about the partnership
here [42].

Did you see Redpoint’s InfraRED 100 [43]? They are the most important
private companies powering the modern cloud operating model and MinIO
makes the cut. Super fun for AB and Garima.

Don’t forget that we hit the NASDAQ for the second time. Let’s make
it a hat-trick.

The hyper-talented Aleksey Timin over at ReductStore wrote a piece in
DevTo about how to use MQTT data storage [44] – calling out MinIO for
its edge advantages.

Divine Odazie gave us a shout out in their Spacelift story on Kubernetes
Sidecar Container – Best Practices and Examples [45].

We are seeing a lot of our stuff show up on HackerNoon (so much so that
we are now posting there). This one is on the challenges of tackling AI
at scale – on the cloud or on-prem [46].

ICYMI in person – here is an excellent session from NAB with Satish
Ramakrishnan on building an object store to support the world’s
largest streaming entertainment company [47]. Many hundreds of
petabytes.

Neel over on DevTo had a nice piece on converting a PDF document into
images using Python using MinIO as the object store [48]. We see a fair
amount of these performance oriented document store plays.

Another DevTo article by Teja Kummarikuntla [49] for ToolJet delves into
the process of leveraging Google Sheets and ToolJet to create a
streamlined inventory and order management app — with a MinIO
shoutout.

SRE writer at InfraCloud Technologies posted an article [50] about
simplifying Kubernetes native testing with TestKube using MinIO.

This month saw another awesome post from Vasileios Anagnostopoulos [51],
this time on Apache Airflow addressing the challenges of ETL that data
engineers deal with.

Umit Cakmak from Altogic and Agnost tweeted [52] that they’re using
MinIO as the default storage provider of Agnost cluster, praising our S3
compatibility.

Bharat Chaudhury goes into MinIO Custom Authorization scenarios [53] in
this detailed Medium post. A nice piece of work.

The talented Adeesh Acharya who also writs for Cloud Native Daily pens
an excellent article on Replacing your FileSystem with MinIO. [54]

Software developer Lukasz Moskwa’s Medium article [55] explains MinIO
as the preferred open source alternative to AWS S3. His article offers a
guide to installation, bucket management, user handling, and configuring
a static file web server for secure and controlled access.

Related Articles

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More