Show HN: I stripped DALL·E Mini to its bare essentials and converted it to Torch

If people just want to run text to image models locally by far the easiest way I've found on Windows is to install Visions of Chaos.

It was originally a fractal image generation app but it's expanded over time and now has a fairly foolproof installer for all the models you're likely to have heard of (those that have been released anyway).

https://softology.pro/tutorials/tensorflow/tensorflow.htm

also, https://pollinations.ai/

Needs a google sub to run colab (DALL-E itself needs Colab Pro, but other models run on free version).

edit: not local! but very handy.

I can't make any of the image generation worked - they all fail with the error message. Did you have luck with any of them?

not really surprising, but

bare minimum GPU: NVIDIA 2080 with 8GB VRAM

300 Gb of disk space

</caveat emptor>

It should probably have an option to download just one model instead of all of them. 300GB is a lot.

Alas, the very first point was already a no-go for me

where did you get these? couldnt find it, mightve missed smth

Thank you, I've been looking for something like that and it looks very cool, judging by this tutorial that shows it in action, as it creates an image and displays the results in progress:

https://youtu.be/4_LgrAL7EWg?t=163

Visions of Chaos is amazing.

Its great if you want to run more "classic" AI algorithms as well!

I guess we will have text-to-image startups in the next batch of YC.

Pornhub is working on something I hear

It's only a matter of time before we get a good NSFW image generation algorithm. Text erotica generation is already a not-insignificant part of the public AI world (remember AI dungeon? Most people used it for porn). The question is whether it's going to come out of the established adult industry or not. There are clear benefits (no real humans needed anymore, personalized fetish material forever, etc.), the only issue is whether they're willing to deal with the inevitable bad press (e.g. fake images of real people, taboos like underage content or disturbing imagery). I wouldn't be surprised if any models from the adult industry will basically be heavily gimped from the start to avoid liability.

I feel like anybody, especially female, who have high res photos/videos of themselves nude or not on the internet are going to wake up one day that they have turned into a pornstar.

For example, kpop idols. There are high rest 60fps fancams of them dancing, there are high res broadcasts, images from all angles.

There's already a huge market for deep fake kpop and I believe that they are in for a reckoning.

which is an awful awful thing ;)

Your last sentence combined with your introductory visual of somebody “waking up” one day as a pornstar is pretty disturbing. Anybody waking up to discover they are the subject of pornography would be experiencing a traumatic public humiliation. Check your head.

BTW, basically every “female” along with male out there has high resolution photos on social media.

> traumatic public humiliation. Check your head.

and this is my fault how?

I agree with the other commenter and feel the connotations of your post, the hints at your enjoyment at the prospect of this potential future - are disturbing.

It's your lecherous winking that's so pathetic.

I'm not sure whether you actually think this is good or bad but honestly I bet it would actually be a net positive, in a "if everybody's a pornstar, nobody is" kind of way. If you can generate nsfw video from a picture of a face with the press of a button, it will probably cease to be a thing that matters to anybody within a generation or two, in the same way we think about Photoshop today. It might be a bit of an awkward transition though since it definitely offends most modern people's sensibilities.

Lets be real though, it's the anime-style generation models that are going to be the pioneers in this field.

It would be an awful thing and it's creepy that you seem to think otherwise.

Possibly related: I was at Cornell when there was a guy who was an excellent artist and was making money by selling pornographic art of other actual students. Needless to say, a stop was put to this once word got out...

wandb forbids reusing the models and other information, independently of their usage, so they should find another source for their models

EDIT: as I am being accused of inventing it I will quote the terms of agreement and license, since maybe its own founder seems to not have read it or someone without training on how to write proper terms and agreements made it for them and the restrictive usage of "Material" does apply to its hosted software.

Note that there is no formal definition of "Materials" or "Service", so that it applies to all the contents of the webpage including the software stored there: https://wandb.ai/site/terms

I quote it:

2. Use License Whether you are accessing the Services for personal, non-commercial transitory viewing only (our free license for individuals only), for academic use, or for commercial purposes (our subscription package for businesses), permission is granted to temporarily download one copy of the information or software (the “Materials”) from our website. This is the grant of a license, not a transfer of title, and under this license, you may not: a. Modify or copy the Materials; b. Use the Materials for any commercial purpose, or for any public display (commercial or non-commercial); c. Attempt to decompile or reverse engineer any software contained in the Materials; d. Remove any copyright or other proprietary notations from the Materials; or e. Transfer the Materials to another person or "mirror" the Materials on any other server. This license shall automatically terminate if you violate any of these restrictions and may be terminated by us at any time. Upon terminating your viewing of these materials or upon the termination of this license, you must destroy any downloaded materials in your possession whether in electronic or printed format. f. Utilize our personal license for individuals for commercial purposes and any such use of our personal license for commercial purposes (e.g. using your corporate email) may result in immediate termination of your license.

Founder of Weights & Biases here (wandb). We don’t forbid anything, models are property of the people who created them.

Why do you think that?

EDIT: I'll edit respond, since you did. Look at sections 3b and 3c in the terms, they cover Models and other user content specifically. Those are user property, not our property. But I can see how this is confusing. We will clarify it.

Did you write your own terms of agreement rather than a lawyer? I signed up today and that was explicitly written in the license agreement, point 2. Note that there is no formal definition of "Materials" or "Service", so that it applies to all the contents of the webpage including the software stored there, and as soon as anything is ambiguous the interpretation is free (or random).

https://wandb.ai/site/terms

I quote it:

Thank you for responding!

I am certainly not a lawyer, however, 3b and 3c (from the terms link you posted) state that user content, including specifically Models, are property of the user.

Are you saying you think there is a conflict between 2 and 3b, 3c, or did you miss section 3?

``` 3. Intellectual Property & Subscriber Content

a. All right, title, and interest in and to the Services, the Platform, the Usage Data, the Aggregate Data, and the Customizations, including all modifications, improvements, adaptations, enhancements, or translations made thereto, and all proprietary rights therein, will be and remain the sole and exclusive property of us and our licensors.

b. All right, title, and interest in and to the Subscriber Content, including all modifications, improvements, adaptations, enhancements, or translations made thereto, and all proprietary rights therein, will be and remain Subscriber’s sole and exclusive property, other than rights granted to us to enable (i) Subscriber to process its data on the Platform, and (ii) us to aggregate and anonymize Subscriber Content solely to improve Subscriber's user experience.

c. Subscriber Content means any data, media, and other materials that Subscriber and its Authorized Users submit to the Platform pursuant to this Agreement, including, without limitation, all Models and Projects, and any and all reproductions, visualizations, analyses, automations, scales, and other reports output by the Platform based on such Models and Projects. ```

Sorry Slewis, I cannot reply to your other comment with another subcomment.

This is an ambiguation and it is an issue that as founder you should address as it could be interpreted as a self contradictory agreement and then invalidate part of the agreement (as I've seen with non formal open source licenses, ill formed patents which became bypassed, etc).

A way to solve that might be using a glossary of what do you mean by each term, however I do recommend you to use a lawyer for such things as this sort of mistakes might become expensive later on.

Yes, I can certainly see how this is confusing. We will work with our legal team to clarify the terms.

Go buy a copy of https://www.survivingiso9001.com/ [No affiliation] for your lawyers. Not directly for the subject matter, but as a cautionary tale about consequences of word choice.

I spun up an aws ubuntu ec2 with 2 Tesla M60. When I run python3 image_from_text.py --text='alien life' --seed=7

I get this error detokenizing image Traceback (most recent call last): File "/home/ubuntu/work/min-dalle/image_from_text.py", line 44, in <module> image = generate_image_from_text( File "/home/ubuntu/work/min-dalle/min_dalle/generate_image.py", line 74, in generate_image_from_text image = detokenize_torch(image_tokens) File "/home/ubuntu/work/min-dalle/min_dalle/min_dalle_torch.py", line 107, in detokenize_torch params = load_vqgan_torch_params(model_path) File "/home/ubuntu/work/min-dalle/min_dalle/load_params.py", line 11, in load_vqgan_torch_params params: Dict[str, numpy.ndarray] = serialization.msgpack_restore(f.read()) File "/usr/local/lib/python3.10/dist-packages/flax/serialization.py", line 350, in msgpack_restore state_dict = msgpack.unpackb( File "msgpack/_unpacker.pyx", line 201, in msgpack._cmsgpack.unpackb msgpack.exceptions.ExtraData: unpack(b) received extra data.

I get a similar error running it locally (not sure if related, but it also can't find my GPU, which is a 3080ti and should be sufficient): Traceback (most recent call last): File "/home/pmarreck/Documents/min-dalle/image_from_text.py", line 44, in <module> image = generate_image_from_text( File "/home/pmarreck/Documents/min-dalle/min_dalle/generate_image.py", line 75, in generate_image_from_text image = detokenize_torch(torch.tensor(image_tokens)) File "/home/pmarreck/Documents/min-dalle/min_dalle/min_dalle_torch.py", line 108, in detokenize_torch params = load_vqgan_torch_params(model_path) File "/home/pmarreck/Documents/min-dalle/min_dalle/load_params.py", line 12, in load_vqgan_torch_params params: Dict[str, numpy.ndarray] = serialization.msgpack_restore(f.read()) File "/home/pmarreck/anaconda3/lib/python3.9/site-packages/flax/serialization.py", line 350, in msgpack_restore state_dict = msgpack.unpackb( File "msgpack/_unpacker.pyx", line 202, in msgpack._cmsgpack.unpackb msgpack.exceptions.ExtraData: unpack(b) received extra data.

Can anyone give instructions for M1 Max MBP? I had a compilation issue in building the wheel for psutil that looks like "gcc-11: error: this compiler does not support 'root:xnu-8020.121.3~4/RELEASE_ARM64_T6000'" (gcc doesn't support ARM yet?)

What toolchain will get it working on Mac?

Which GCC are you using?

~ which gcc

  /usr/bin/gcc

~ gcc --version

  Apple clang version 13.1.6 (clang-1316.0.21.2.5)
  Target: arm64-apple-darwin21.5.0
  Thread model: posix
  InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Running the "alien life" example from the README took 30 seconds on my M1 Max. I don't think it uses the GPU at all.

I couldn't get the "mega" option to work, I got an error "TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float32, float16" (looks like a known issue https://github.com/kuprel/min-dalle/issues/2)

Edit: installing flax 0.4.2 fixes this issue, thank all!

The thread now has a fix. As for the GPU, it's possible to get it working with some extra steps https://github.com/google/jax/issues/8074

Macbook Pro M1 Pro numbers (CPU):

    python3 image_from_text.py --text='court sketch of godzilla on trial' --mega   640.24s user 179.30s system 544% cpu 2:30.39 total

From reading that thread it didn't sound like GPU was fully supported yet, were you able to get it working?

Pretty much identical on M1 Max

python3 image_from_text.py --text='a comfy chair that looks like an avocado' 612.30s user 180.72s system 552% cpu 2:23.52 total

Thanks for catching this. I just updated it so that it should work with the latest flax.

Change the flax version to 0.4.2 (currently 0.5.2) will work

So much for semver :(

0.y.z is kind of an "all bets are off" situation in semver: https://semver.org/#spec-item-4

> Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

Variant schemes like the one respected by Cargo (https://doc.rust-lang.org/cargo/reference/semver.html) aren't usually much different.

> Initial development releases starting with "0.y.z" can treat changes in "y" as a major release, and "z" as a minor release.

What is wandb.ai, and Why does it keep asking for an API key?

It's not listed in the requirements

I've posted it as an issue

Founder of Weights & Biases here (wandb). There are a couple issues raised in this thread: api key shouldn’t be required to download a public model, cache in home directory is annoying for this case. We will fix them.

Thanks for the tip! I just updated the colab to login anonymously

Oh, great! HN as bug resolution mechanism++.

And what about the install script? That one is still failing.

> What is wandb.ai

Weight & Biases

> Why does it keep asking for an API key

From the README:

the Weight & Biases python package is used to download the DALL·E Mini and DALL·E Mega transformer models

It might not be obvious you need an account if you aren't in the field though.

Why is this needed to download the model?

I'd prefer to download it myself and choose where I put it too.

It now uses some hashed filename in some config directory in your homedir for this, I dislike this and want control over where I put models, make it more self contained instead of random directories spread all over your OS, and give them as input by file path.

This feedback is about dalle mini playground instead but it does the same thing. If this one is stripped to bare essentials I'd expect this type of dependencies stripped too.

Edit: I don't want to seem like complaining too much though and am very happy with these open models and tooling for them. Thanks!

It’s not, it just makes it easier. Should be pretty simple to modify the code to work the way you want.

Saas dashboard for monitoring/mertics

Does it just download pre-trained DALL-E Mini models and generate images using them? Because I can't seem to find any logic in that repo other than that. I'm not into that field, just curious if I'm missing something.

To add to the sibling comment. The challenge is not converting the weights as such. Pre-trained model weights are just arrays of numbers with labels that identify which layer/operation they correspond to in the source model. The challenge is expressing the model in code identically between two frameworks and programmatically loading the original weights in, since these models can have hundreds of individual ops. Hence why you can't just load a PyTorch model in Tensorflow or vice versa.

There are tools to convert to intermediate formats, like ONNX, but they are limited and don't work all the time. The automatic conversion tools usually assume that you can trace execution of the model for a dummy input and usually only work well if there isn't any complex logic (e.g. conditions can be problematic). Some operations aren't supported well, etc.

This isn't always technically difficult, but it's tedious because it usually involves double checking that at all steps, the model produces identical outputs for a given input. An additional challenge when transferring weights is that models are fragile and minor differences might have large effects on the predictions (even though if you trained from scratch, you might get similar results).

Also for deployment, the less cruft in the repository the better. A lot of research repositories end up pulling in all kinds of crazy dependencies to do evaluation, multiple big frameworks etc.

I don't understand why execution of a model with the same layers and weights would be different between PyTorch and Tensorflow.

Is it a problem of accumulation of floating-point errors in operations that are done in a different order and with different kinds of arithmetic optimisations (so that they would be identical if they used un-optimised symbolic operations), or is there something else in the implementation of a neural network that I'm missing?

In principle you can directly replace function calls with their equivalents between frameworks, this works fine for common layers. I've done this for models that were trained in PyTorch that we needed to run on an EdgeTPU. Re-writing in Keras and writing a weight loader was much easier than PyTorch > ONNX > TF > TFLite.

Arithmetic differences do happen equivalent ops, but I've not found that to be a significant issue. I was converting a UNet and the difference in outputs for a random input was at most O(1e-4) which was fine for what we were doing. It's more tedious than anything else. Occasionally you'll run into something that seems like it should be a find+replace, but it doesn't work because some operation doesn't exist, or some operation doesn't work quite the same way.

It's just that expressing those "layers and weights" in code is different in tensorflow and Pytorch. I think a good parallel would be expressing some algorithm in two programming languages. The algorithim might be identical, but JS uses `list.map(...)` and python uses `map(list, ...)`, and JS doesn't priority queues in the "standard lib" while Python does, ...etc. Similarly, the low level ops and higher level abstractions are (slightly) different in Pytorch and Tensorflow.

I'm not too familiar with Tensorflow, so I can't give an example there, but a similar issue I recently faced when converting a model from Pytorch to ONNX is that Pytorch has a builtin discrete fourier transform (DFT) operation, while ONNX doesn't (yet. They're adding it). So I had to express a DFT in terms of other ONNX primitives, which took time.

In principle all operations can be translated between frameworks, even if some ops aren't implemented in one or the other. This, however, depends on whether the translation software supports graph rewriting for such nodes.

Lambdas and other custom code are also problematic, as their code isn't necessarily stored within the graph.

Seems like it'll be a serious issue for people hoping we can someday upload human brains into machines if we can't even transfer models from TensorFlow to PyTorch reliably.

Unrelated problems, really. Having written such translation library, I can say with confidence that the only reason for this is lack of interest in it.

Graph to graph conversion can be tricky due to subtle differences in implementation (even between different versions of the same framework), but it's perfectly possible, though not many utilities go all the way to graph rewriting if required.

Problems arise with custom layers and lambdas, which are not always serialised with the graph depending on the format.

Human brains have high degrees of plasticity -- our brain is much more generic than its usual functional organization would suggest. I don't think we'd be able to upload brains ("state vectors" was the sci-fi buzzword) before digital supports are able to emulate that.

They converted the original JAX weights to the format that Pytorch uses. Because JAX is still fairly new, it can be a lot easier to get Pytorch to run on e.g. CPU. I do find the number of upvotes interesting and I imagine many people just upvote things that have DALLE in the title, to a degree.

Not to discourage the OP of course, great work.

Look how much easier it is to install & run, people are interested in and up-voting the result, not how much work was (or wasn't) required to achieve it.

Fair enough - that makes sense, just haven't had a chance to dive into it yet.

It still seems to require JAX somewhere to work.

On my desktop, running the example

> python image_from_text.py --text='alien life' --seed=7

results in

> RuntimeError: This version of jaxlib was built using AVX instructions, which your CPU and/or operating system do not support. You may be able work around this issue by building jaxlib from source.

Unfortunately, following the instructions to build JAXlib from source (https://jax.readthedocs.io/en/latest/developer.html#building...) result in several 404 not found errors, which later cause the build to stop when it tries to do something with the non-existent files.

Unfortunately, it looks like I won't be running this today.

It defaults to JAX. To use torch, add --torch parameter

Anyone have some stats on inference time and RAM requirements? (on specific hardware)

I have a 2019 MacBook Pro 2.4Ghz Quad-core i5, 8GB RAM with Intel graphics card

  python3 image_from_text.py --text='a happy giraffe eating the world' --seed=7  154.61s user 22.18s system 262% cpu 1:07.40 total

  WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

As you can see, it took 1min 7seconds to complete.

I assume it would be much faster with a grunty graphics card

Using an rtx 3090 (NVIDIA gpu with 24GB of RAM):

Mini = 5.33 s

Mega = 14.7 s

Update: about 1/2 that time is just loading the model, so if you load the model and then generate multiple images, it drops to:

Mini = 3.91 s

Mega = 8.86 s

NB: Needs a weights & biases account in order to download the models.

You can CTRL+C that prompt and it'll download them anyway but it'll tell you that you can't visualize your results then.

Surly wandb is not a bare essential?

You can download the models yourself if you don't want to use it.

There are links on the readme there, or you can run:

> wandb artifact get --root=./pretrained/dalle_bart_mini dalle-mini/dalle-mini/mini-1:v0

and select option (3)

Error: Project dalle-mini/dalle-mini does not contain artifact: "mini-1:v0"

any way to download a pretrained dalle-mega model this way?

EDIT: wandb artifact get --root=./pretrained/dalle_bart_mega dalle-mini/dalle-mini/mega-1-fp16:v14

Note, it is a 4938MB download

I wish this requirement was in the README

Now we just need to containarise it (there are a few docker python nvidia images)

Good Job. What are the hardware requirements?

Interesting when testing with inputs like "Oscar Wilde photo" or "marilyn monroe photo" and comparing to a Google image search. After some iterations we can have quite similar images but the faces are always blurry.

That's intentional. When training the models, they filter out human faces and adult content among other things.

What is the maximum resolution possible with this?

If it depends on the hardware, what would be the limit when one rents the biggest machine available in the cloud?

Fixed size of 256x256. It cannot go any bigger or smaller.

Out of curiosity: why it cannot be changed? I know nothing about this field so... thanks!

Transformers output fixed-length sequences. For this transformer they chose 256 pixels, or 32 "image tokens" that each decode to an 8-by-8 pixel "patch".

You can technically increase or decrease this - or use a different aspect ratio by using more or fewer image tokens, but this is static after you start training. It will also require more "decodes" from the backbone VQGAN model (responsible for converting pixels to image tokens), and thus take longer to run inference on.

CLIP-guided VQGAN can get around this by taking the average CLIP score over multiple "cutouts" of the whole image allowing for a broad range of resolutions and aspect ratios.

It's already being scaled up to 256x256 from something smaller anyway. You could add an extra upscaler to go further which I've tried with moderate success, but you're basically doing CSI style 'enhance' over and over.

Because that is how the network is trained. You could modify the network size and retrain to get different resolutions.

The google collab link works if you replace the computed path to flax_model.msgpack on line 10 in load_params.py with ‘/content/pretrained/vqgan/flax_model.msgpack’

Edit: actually it's easier to open a terminal and move /content/pretrained/vqgan to /content/min-dalle/pretrained/vqgan

Thanks for figuring this out. The problem was that the vqgan repository was being cloned to the wrong directory. The updated colab should work now

The updated version gives me this (after successful setup with the example alien thing):

UnfilteredStackTrace Traceback (most recent call last) <ipython-input-2-0e20e3adf861> in <module>() 2 ----> 3 image = generate_image_from_text("alien life", seed=7) 4 display(image)

67 frames UnfilteredStackTrace: TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float16, float32.

The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last) /content/min-dalle/min_dalle/models/dalle_bart_decoder_flax.py in __call__(self, decoder_state, keys_state, values_state, attention_mask, state_index) 38 keys_state, 39 self.k_proj(decoder_state).reshape(shape_split), ---> 40 state_index 41 ) 42 values_state = lax.dynamic_update_slice(

TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float16, float32.

You need to install flax 0.4.2. If you're using collab you just open a terminal (icon in the bottom left of the screen) and run:

    pip3 install flax==0.4.2

Yes this is the fix for now. I need to address what is actually causing the dtype mismatch

   TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float32, float16.

That dinosaur image has fantastic meme potential

I couldn't get this to pick up my graphics card when running it with WSL 2, it's just says no cuda devices found or something so I gave up, not sure if anyone had any luck

Free idea: Same but for making short video clips, and then eventually producing entire movies.

Not sure why this is downvoted: seems like the inevitable endpoint of AI vision/image generation research and it warrants consideration and discussion.

Love this, before I only ever saw code that ran this model through jax. This seems to perform much better on my m1.

WTF is Wandb.ai? This seems like a sneaky way to get people to sign up for this wandb thingy.

Has anyone got this running on M1?

The results are amazingly poor. Try "biden plays chess against napoleon"

I tried a few non-descriptive statements from random tweets. As it turns out, nobody's made a random tweet since 2016, but for the few that exist, the results are great. E.g. "Good Morning Everyone , Happy Nice Day :D" generates something that can only be described as bored-ape meets Picasso in kindergarten. Probably the next-gen 1M$ NFT. If anybody needs proof that these models don't think, this is it.

What is the license of the generated artwork?

I think that decision refers to an attempt to getting it copyrighted with the AI/computer program as the author.

Alternatively, you could try to hide the fact that something was created by an AI, and probably get away with it, but that isn't relevant to the legal question of whether something can be copyrighted when it is known to be created by an AI.

You pressed the button, it's owned by you, all rights are reserved by default.

This statement is not accurate[0].

> But copyright law only protects “the fruits of intellectual labor” that “are founded in the creative powers of the [human] mind.” COMPENDIUM (THIRD) § 306 (quoting Trade-Mark Cases, 100 U.S. 82, 94 (1879)); see also COMPENDIUM (THIRD) § 313.2 (the Office will not register works “produced by a machine or mere mechanical process” that operates “without any creative input or intervention from a human author” because, under the statute, “a work must be created by a human being”). So Thaler must either provide evidence that the Work is the product of human authorship or convince the Office to depart from a century of copyright jurisprudence.

[0] https://www.copyright.gov/rulings-filings/review-board/docs/...

This is literally irrelevant, this is about attempting to register art as owned BY the computer

If you give the computer the instructions, such as “avocado chair”, the avocado chair is yours.

It wouldn’t be yours if it was something like a deep dream- if you ran the program with no input and generated a “random” work.

Please cite a source that confirms your claim, rather than stating it as a fact without evidence. You may be right, but I have no way of confirming that given the content of your comment.

The report I’ve cited makes a compelling argument against your claim, and several prominent organization’s copyright policies align with it.

Your quote says “without any creative input or intervention from a human author”.

Just hitting a "Generate random artwork" button indeed certainly doesn't seem to qualify as "creative output or intervention", but as for how DALL-E and consorts currently work, I'd say that coming up with a suitable prompt text, potentially refining it to get the output closer to what you want, curating the output, maybe using one of the output pictures as input for further processing, etc. arguably all are at least some amount "creative input or intervention from a human author".

Is Dalle significantly different from Adobe Photoshop in the eyes of the law? In both cases you will use a software agent to create art. In fact CGI art has existed for decades. Surely this is a settled question

I think there might be some session leakage. I typed “A pig with a bowler hat.” and the model returned a picture of a half moon.

No matter what I typed, it always generated the same image of a moon half-covered in shadow. I think something might be a bit buggy with this.

I always generates an image of a moon for me.

has anyone applied compression techniques to large models like dalle-2?

I don't think this is infringing on anyone's rights.

This project is done by volunteers unrelated to open ai.

They've started changing the name in some places as well to avoid this kind of confusion - they've renamed the app to Craiyon as OpenAI have asked them to. (https://www.craiyon.com/#headlessui-disclosure-button-7)

This stuff is so cool and it makes me happy that we're democratizing artistic ability. But I can't help but think RIP to all of the freelance artists out there. As these models become more mainstream and more advanced, that industry is going to be decimated.

Not differently to translators etc.: not required for every small task, still required for doing things professionally.

Translators at least have official documents (aka the only times I see a translator in my life across 3 countries) because the government is retarded and needs someone with a title to translate "Name" and "Surname" on a birth certificate.

There is no equivalent for illustrators.

My friends who studied some specific language are all unemployed or doing unqualified jobs. Their peers from a generation before are teachers or work in some embassy.

That said, before some unicorn really start doing some serious polishing, you'll still want some illustrators to piece art together. Taking the output of these models won't deliver a ready made product easily.

Lots of professional translators moved into language tuition.

I guess lots of artists will move into teaching art.

I see tools like this might increase interest by the public into making their own art with the help of new tools, and some will want to be taught.

I will rephrase it: if people today are available to eat dirt instead of nourishment - etc. for innumerable instances -, to get contented with lack of quality (with the akin acceptance of consequential decline of the general perception of quality), the fault is more in decadence than in instruments.

You need well cultivated intelligence to obtain a good product: if "anything goes" is the motto, if "cheap" is the "mandate", there lies the issue.

Only in the same manner that GPT-3 eliminates the need for writers. Or influencers remove the need for advertising.

That is, a surface-level view might show these things as equivalent, but the skills required to produce a decent result are not encapsulated in the averages that models contain.

I'm sure a lot of "content writers" for SEO spam will become obsolete. The content level is already rock bottom, so is easily replaced by brainless machines.

But I'm more bothered by sociatal effects where art is automated. I believe it'll expedite the effects we saw when the internet short circuited the feedback loop for creators, killing any gaps where non revenue optimizing humane creative force could thrive. Not to mention the crazy mimetic positive feedback loops tearing the discourse apart.

I dunno. I've read a lot of GPT output. It lacks a certain consistency over medium scales. The big picture checks out, and the word-by-word grammar checks out, but the sentence-by-sentence information often isn't cohesive, or certain entity references change over time.

Text-to-image algos did the same thing for a while, but you look at the latest full-size DALL-E and it's pretty much flawless.

https://openai.com/dall-e-2/

If I were considering art school, I'd certainly be reconsidering my options. Maybe there are some defects in the output, but nothing photoshop can't fix.

I think where humans win out (for now) is where a high degree of specificity/precision is needed (e.g. graphic design). Or certain legal requirements are present - AI art can't be copyrighted at this time - such as logo design.

Or most places where art is displayed and/or sold, because those places generally disparage purely digital art, and method is part of what goes into the valuation of the piece. "Oil on canvas" is worth more than "AI-assisted digital print", especially because duplicating it requires considerable effort.

Taste and empathy are tough to emulate.

Dall-e2 sometimes pulls through big time, though: https://www.reddit.com/r/dalle2/comments/vbtqkw/dalle_really...

But it's not going to tell you in clear words if your prompt was bad to begin with, like a human would, hopefully :).

Yeah, sometimes it pulls through, which means it still needs someone with taste and empathy to filter results.

I really don’t think that this will be the case anytime soon. Images can be generated from zany prompts, but making a coherent, fits-together-well set of images for a product like a web page or an illustrated book is far off.

Further, artists have a host of skills that DALL-E doesn’t, like “take that image, but change the colors a bit to make it more acceptable to the client, and move the cartoon bird a little further down”. Or “make an image that will look as good in a print as it does on a small screen”.

"an illustrated book is far off". Hi, just to mention that i'm using mini DALL-E for graphic novel experiments... Indeed not really a human quality but ... https://twitter.com/Dbddv01

Recommend

Pandas Functions you Should Know for Data Analysis

元宇宙或已有 30 年历史用不同的视角看待元宇宙

海碗

自欺

工具箱

口头写作

和飞信将停止服务中国移动：可下载“移动办公”进行使用

西江月 · 两两相忘

见信如晤

模拟经营游戏《鲤鱼饼大作战》登陆NOLO Sonic应用商店

About Joyk