Data:
{
"text": "Hello community. I have tried a lot of different AI models for various tasks. For example, let's overview a few of them:\n\n1. Diffusion Models. They work by taking random noise and your text and, based on the trained data, they are able to make the image you want (text-to-image). But this isn't an art! I want AI to open an application for drawing I like, either Adobe Photoshop, Pixelmator Pro, or even ProCreate. Create a canvas, select a tool, and start DRAWING! And yes, I should be able to tell in the middle for the proceeds how to do, and the model should be able to explain her decisions!\n\n2. TTS (text-to-speech) and Suno (text-to-music). The the actual f\\*ck are you doing? I don't want to see diffusion-based models even here or any Mel spectrograms or Midi-transformer generation. Can you make an AI with a hardcoded voice per every possible sound (check out what are phonetics)? When she wants to talk, she will control the tone and other parameters. If singing, just compress or make longer the sounds\n\n3. LSTM, Model merging, Multimodality. Why not? Why can't I take Whisper, LLaMA, Coqui, Diffusion, and Moondream and combine them into a single model? Why can't I take a 7B model and upscale it to 10B while filling in new data on the added params? Why can't I just say to the model your name is \"Yuna\" for any context and tweak parameters only for this, without fine-tuning with catastrophic forgetting?",
"label": "r/chatgpt",
"dataType": "post",
"communityName": "r/ChatGPT",
"datetime": "2024-05-23",
"username_encoded": "Z0FBQUFBQm5Lak1jWWp1MkF5cGVhSVlCNjFwLUJ3cFYybm5EZklKTUJPdE0tWEp2eGxRcHJCVzRzekQxbDl1cHF6NWhRWF9pUTJtWERpYXNkbHZrS3RCdE1JNTBJalhXb2c9PQ==",
"url_encoded": "Z0FBQUFBQm5Lak9zM1g2Y0p0bENQZnMySHNaWmdDb08yRDBFbi1fMXhZeHpua1ViUm52d1JSSHFNUTVDRE1TMnVRV01WcEdpOUIya1c2eXVvYWFoY1ZyREhMckhMb0thZk9uYkg2YXFsUHhLYUNTY2s1Q3dweXFyc2p1d0pINHRjck9YMlI3eERWYTFyREF1V3ZtNF9udTZKdF9MeVRHQVMxNTJqb21yYXdGRTdQWGo4MW9VWldqU0Y4SkpLZEJ1ZE5LZjZtU01wT01l"
}