GenAI (Generative AI)

Generate new text, images, audio, and video rather than discrete numbers, classes, and probabilities.

This article is a work currently in progress.

This article introduces Generative AI (GenAI) on several cloud platforms:

AI is a discipline (of theories to make machines act like people, such as learning)

At microsoft

Microsoft has ownership interest in OpenAI, whose ChatGPT exploded in popularity in 2023.

“Azure OpenAI” became an offering March, 2023

Microsoft’s GitHub also unveiled its CoPilot series for developers on Visual Studio IDEs.

Many of Microsoft 365 SaaS offerings (Word, Excel, PowerPoint, etc.) have been upgraded with AI features.

Microsoft Bing Search

https://www.linkedin.com/learning/generative-ai-the-evolution-of-thoughtful-online-search by Ashley Kennedy (Managing Staff Instructor at LinkedIn Learning)

Search: Crawling, Indexing, ranking

https://lnkd.in/eCDjW4EW

ChatGPT made available to the public Nov 2022 reached 1 million users in less than a week.

Limitations:

Biased databases input
Point-in-time data (frozen in time)
Lack of common sense
Lack of creativity
No understanding of generated text
normalization of mediocrity

Tutorials

The “What is Generative AI” course at LinkedIn Learning by Dr. Pinar Seyhan Demirdag (Senior Data Scientist at Microsoft) is 1 hour and 15 minutes long and has 5 modules:

Introduction to Generative AI
Generative AI in action
Generative AI in the real world
Generative AI in the future
Next steps

The learning path for Generate artificial intelligence has 5 modules:

Microsoft has a Microsoft AI Fairness initiative.

https://www.linkedin.com/learning/what-is-generative-ai
https://www.linkedin.com/learning/generative-ai-imaging-what-creative-pros-need-to-know
https://www.linkedin.com/learning/generative-ai-the-evolution-of-thoughtful-online-search
https://www.linkedin.com/learning/generative-ai-for-business-leaders
https://www.linkedin.com/learning/ai-accountability-essential-training-16769302
https://www.linkedin.com/learning/responsible-ai-principles-and-practical-applications
https://www.linkedin.com/learning/foundations-of-responsible-ai
https://www.linkedin.com/learning/introduction-to-responsible-ai-algorithm-design
https://www.linkedin.com/learning/tech-on-the-go-ethics-in-ai
https://www.linkedin.com/learning/what-is-generative-ai/how-generative-ai-workspace by Pinar Seyhan Demirdag
https://www.linkedin.com/learning/streamlining-your-work-with-microsoft-bing-chat/understand-how-chat-ai-works by Jess Stratton (LinkedIn Learning Staff Author, Tech Consultant)

GitHub CoPilot

https://github.com/features/copilot

At Google

Dr. Gwendolyn Stripling, AI Technical Curriculum Developer at Google Cloud created courses at several sites:

A. https://www.coursera.org/learn/introduction-to-generative-ai/lecture/TJ28r/introduction-to-generative-ai

B. Google created a Generative AI learning path FREE 1-day courses with FREE quizzes (but one HANDS-ON lab in Vertex AI):

Introduction to Generative AI
- Hallucinations
- Text-to-image using Stable Diffusion
Introduction to Large Language Models - Google’s Bard AI https://bard.google.com/ is Google’s answer to OpenAI’s GPT series of large language models to generate images, audio, and videos.
- https://www.techrepublic.com/article/google-bard-cheat-sheet/
Introduction to Responsible AI 1 day
Generative AI Fundamentals
Introduction To Image Generation with diffusion models.
Encoder-Decoder Architecture
- https://www.youtube.com/watch?v=zbdong_h-x4 Architecture Overview
- https://www.youtube.com/watch?v=FW–2KkTQ1s Lab Walkthrough
- Text generation with an RNN on github.com/GoogleCloudPlatform/asl-ml-immersion
Attention Mechanism 2015 for Tensorflow
- https://www.youtube.com/watch?v=fjJOgb-E41w to improve text translation by giving each hidden state a soft-maxed score
Transformer Models & BERT Models 2017-18 for NLP
- VIDEO: Overview added context to words
- BERT (Bidirectional Encoder Representations from Transformers) developed by Google in 2018, trained using Wikimedia & Books in two variations: base (12 layers with 768 hidden units and 12 attention heads) and large (24 layers with 1024 hidden units and 16 attention heads).
- 15% is what Google found to be the optimal balance in Masking (randomly replacing words with [MASK] tokens) and 85% Next Sentence Prediction (NSP) (predicting whether two sentences are adjacent or not).
- BERT input embeddings: Token, Segment, Position, with [SEP]
- Lab resource: classify_text_with_bert.ipynb from github
- VIDEO: HANDS-ON walk-through of running “asl-gup.ipyr” notebook for Sentiment Analysis classifier_model.fit using Vertex AI Tensorflow Karas with GPU accessing the 25,000-record imdb database (trainable=true), optimized for binary accuracy. Run model saved from Google Cloud bucket uploaded to Vertex AI Registry. Deploy to endpoint (5-10 minutes). Test. Delete.
Create Image Captioning Models with a CNN and RNN
- Create Image Captioning Models: Overview
- image_captioning.ipynb on github “Image Captioning with Visual Attention” (on the COCO captions dataset from ResNet)
- Create Image Captioning Models: Lab Walkthrough to AUTOTUNE
- https://paperswithcode.com/sota/image-captioning-on-coco-captions - one and a half million captions describing over 330,000 images from Google Flickr. VIDEO.
Introduction to Generative AI Studio for language, Vision, Speech. It has a “Model Garden”. Reflection Cards.

Other courses:

Generative AI with Vertex AI: Text Prompt Design for language, Vision, Speech. It has a “Model Garden”.
https://www.coursera.org/learn/introduction-to-large-language-models On Coursera: Google Cloud - Introduction to Large Language Models

GenAI Summary

Generative AI is abbreviated as GenAI.

Generative AI differs from other types of AI, such as “discriminative AI” and “predictive AI,” in that it doesn’t try to predict an outcome based on grouping/classification and regression.

Text classification, Translation among languages, Summarization, Question Answering, Grammar correction

Generative AI is a type of artificial intelligence (AI) that generate new text, images, audio, and video rather than discrete numbers, classes, and probabilities.

Output from GenAI include:

Text Generation
Image Generation (“Deep Fakes”), Image Editing
Video generation
Speech Generation: (Text to speech)
Decision Making: Recommandations, Play games
Explain code line by line
Code Generation

GenAI learns from existing data and then creates new content that is similar to the data it was trained on.

GenAI doesn’t require a large amount of labeled data to train on. Instead, it uses a technique called self-supervised learning, which allows it to learn from unlabeled data. This is a huge advantage because it means that generative AI can be used in a wide variety of applications, even when there isn’t a lot of data available.

A foundation model is a large AI model pre-trained on a vast quantity of data that was “designed to be adapted” (or fine-tuned) to a wide range of downstream tasks, such as sentiment analysis, image captioning, and object recognition.

Large Language Models (LLMs) are a subset of Deep Learning, a subset of Machine Learning, a subset of Artificial Intelligence. Machine Learning generate models containing algorithms based on data instead of being explicitly programmed by humans.

NLP (Natural Language Processing) vendors include:

Abnormal
Horizon3.ai
Darktrace identifies phishing emails using ML
SentinelOne

LLM creators:

OpenAI - now closed source
Google
NVIDIA
Meta (Facebook PyTorch) - open source
UC Berkeley
LMU Munich
Seyhan Lee | Artist | Generative AI Expert | AI Art Lab works on Hollywood movies

One way models are created from binary files (images, audio, and video) is “diffusion”, which draws inspiration from physics and thermodynamics. The process involves iteratively adding (Gaussian) noise for GAN (Generative Adversarial Networks) and VAE (Variational Autoencoders) algorithms to recognize until images look more realistic. This process is also called creating “Denoising Diffusion Probabilistic Models” (DDPM).

The models generated are “large” because they are the result of being trained on large amounts of data and also because they have a large number of parameters (weights) that are used for a wide variety of tasks, such as text classification, translation, summarization, question answering, grammar correction, and text generation.

The performance of large language models (LLMs) generally improves as more data and parameters are added.

The Pathways Language Model (PaLM) has 540 billion parameters, trained on Google’s 1.6 trillion parameter Switch Transformer model.
Facebook’s LaMDA has 1.2 trillion parameters.
OpenAI’s GPT-3 has 175 billion parameters.

Such large LLMs require a lot of compute power to train, so are expensive to create. Thus, LLMs currently are created only by large companies like Google, Facebook, and OpenAI.

LLMs are also called “General” Language Models because they can be used for a wide variety of tasks and contexts.

LLMs are also called “Transformer” Language Models because they use a type of neural network called a Transformer used for language translation, summarization, and question answering. Transformers are a type of neural network that uses “attention mechanisms” to learn the relationships between words in a sentence. They are called “Transformers” because they transform one sequence of words into another sequence of words rather than more traditional “Encoder-Decoder” models that focus on the “hidden state” between individual words.

Google’s paper “Attention is all you need” publicized the Transformer architecture in 2017.
Jay Alammar’s “Illustrated Transformer” article and video explain well how Transformers work.
VIDEO: Hugging Face training vs. inference time generating new content

Attention models use a RNN “self-attention” decoder mechanism that allows the model to learn the relationships between words in a sentence. VIDEO CS25: Encoder-decoders generate text using either “greedy search” or “beam search”. Greedy search always selects the word with the highest probability, whereas beam search considers multiple possible words and selects the one with the highest combined probability.

LLMs are also called “Autoregressive” Language Models because they generate text one word at a time, based on the previous word. They are called “Autoregressive” because they are a type of neural network that uses a type of neural network called a Transformer. Transformers are a type of neural network that uses attention mechanisms to learn the relationships between words in a sentence. They are called “Transformers” because they transform one sequence of words into another sequence of words.

It uses a neural network to learn from a large dataset.

After being developed, they only change when they are fed new data, called “fine-tuning” the model.

LLMs are also called “Universal” Language Models because they can be used for a wide variety of human written/spoken languages in prompts and outputs.

Prompt Engineering

A prompt is a short piece of text that is given to the large language model as input, and it can be used to control the output of the model in many ways.

Internally, when given a prompt (a request) GenAI uses its model to predict what an expected response might be, and thus generates new content.

OpenAI charges money to use GPT-4 with a longer prompt than GPT-3.5.

“Dialog-tuned” prompts are generate a response that is similar to a human response in a conversation with requests framed as questions to the chatbot in the context of a back-and-forth conversation.

Parameter-Efficient Tuning Methods (PETM) are methods for tuning an LLM on custom data, without duplicating the model. This is done by adding a small number of parameters to the model, which are then used to fine-tune the model on the custom data. This is done by adding a small number of parameters to the model, which are then used to fine-tune the model on the custom data.

Checklist for Prompt Engineering:

Details about content,
context (provide an example of answer),
use clear language
tone,
asthetic
role (imagine you’re the product manager for a brand-new smartphone company. What are ten potential innovative features that could be added within the next five years?)
analogies
debate-style questions (for and against)

References on prompt engineering:

https://www.linkedin.com/learning/introduction-to-prompt-engineering-for-generative-ai
VIDEO: “EPIC prompts”

Limitations

QUESTION: Detect emerging security vulnerabilities?

GenAI output is not based on human creativity, but rather on the data that it was trained on.

So GenAI is currently not built to do forecasting.

But many consider GenAI output as (currently) “creative” because GenAI can seem to generate content that is difficult to distinguishable from human-generated content, such as fake news, fake reviews, and fake social media posts.

Whatever biases were in inputs would be reflected in GenAI outputs.

Concerns

GenAI currently were not designed to be “sentient” in that it does not have a sense of self-awareness, consciousness, or emotions. More importantly, GenAI currently are not designed to have a sense of morality, in that it can generally recognize whether prompts and generated content is offensive, hateful, or harmful.

Developing responsible AI requires an understanding of the possible issues, limitations, or unintended consequences from AI use. Principles include Transparency, Fairness, accountability, scientific excellence. NOTE: “Explainability” is not a principle because it is not always possible to explain how an AI model works. “Inclusion” is not a principle because it is not always possible to include everyone in the development of AI models.

“ChatGPT 3.5 has all of the knowledge and confidence of a 14-year-old who has access to Google.” –Christopher Prewitt

“GPT-3 is a powerful tool, but it is not a mind reader. It is not a general intelligence system. It is not a self-aware being. It is not a robot. It is not a search engine. It is not a database. It is not a knowledge base. It is not a chatbot. It is not a question answering system. It is not a conversational AI. It is not a personal assistant. It is not a virtual assistant. It is not a personal knowledge base. It is not a personal knowledge guru.

Hallucinations (Making Stuff Up)

“Hallucinations” in output are made-up by the model and not based on reality. This can happen due to several causes: * input data is not representative of the real world * input data contains noisy or dirty data

* not trained on enough data
* not given enough context (in prompts)
* not given enough constraints

* prompt does not provide enough context

Their source of data (corpus) is kept confidential because that can be controversial due to licensing, privacy, and reliability issues.

Use of content from books and publications may have copyright concerns.
Use of content from websites would have licensing concerns even though they are publicly contributed
Use of Wikipedia (9 billion documents), Stack Overflow, Reddit, Quora, etc. have concerns about the usefulness that data

To ensure that AI is used responsibly, Google recommends “seeking participation from a diverse range of people”.

Google Bard code generation

explain code line by line
debug lines of source code
translate code from one language to another
generate documentation and tutorials about source code

Google AI Studio

Without writing any code:

Fine-tune models on custom data
Deploy models to production
Create chatbots using Google’s PaLM API for Chat
Image Generation (generate new images or generate new areas of an existing image)

GenAI Studio from PaLM API:

Fine-tune models
Deploy models to production
Create chatbots
Image generation
Write text prompts to generate

Google MakerSuite

Google’s MakerSuite is a suite of GUI tools for prototyping and building generative AI models by iterating on prompts, augment datasets with synthetic data, and deploy models to production, and monitor models in use.

Text to Image (generate new images or generate new areas of an existing image)

Generative AI App Builder

Generative AI App Builder creates apps for generating images.

GenAI API

Text to Image generation

midjourney (like Apple: a closed API, art-centric approach)

DALL-e (Open API released by a corporation - technical over design)

Stable Diffusion

https://github.com/CompVis/stable-diffusion uses Python Colab Notebooks
https://www.gofundme.com/c/help-changes-everything

Users: Stitchfix.com recommends styles.

https://prisma-ai.com/lensa

https://avatarmaker.com

Resources:

OpenAI
ChatGPT Discord server
Prompt Engineering Guide
PromptVine
Learn Prompting
PromptPapers
PromptHub

Video generation

https://www.unite.ai/synthesys-review/

Anomaly Detection

Variational Autoencoders (VAE)

Use cases: Find financial fraud, flaws in manufacturing, Network security breaches,

References

https://www.unite.ai/zero-to-advanced-prompt-engineering-with-langchain-in-python/

At AWS (coming soon)

Others must know: please click to share:

GenAI (Generative AI) was published on August 08, 2023.

At microsoft

Microsoft Bing Search

Tutorials

GitHub CoPilot

At Google

GenAI Summary

Prompt Engineering

Limitations

Concerns

Hallucinations (Making Stuff Up)

Google Bard code generation

Google AI Studio

Google MakerSuite

Generative AI App Builder

GenAI API

Text to Image generation

Video generation

Anomaly Detection

References

At AWS (coming soon)

Others must know: please click to share:

Recommend

Python Certs

Potluck × Is TypeScript Fancy Duct Tape × Back Pain × Cloud Service Rate Limits

3 Ways To Tell If Nintendo Switch Servers Are Down

B端产品经理为啥做不好业务调研？面试时才发现这些思路错了 | 运营派

基金业改革“融冰化雪”，万八高佣金率将成往事，亿万基民减负成真

Razer Unleashes New Lineup Of Kraken Kitty Headsets For International Cat Day

WWDC22 video subtitles now in more languages

JTAG 'Hacking' the Original Xbox in 2023

Custom offer codes for subscriptions now available

比RTX3060快15%，看一波RTX4060快速评测，它真的值得买

About Joyk