Row 45558
Content Data
This page contains data entry 45558 from the Axioma AXP content repository. The structured data below represents the complete record for this entry.
They have also been training only on a subset of internet data. I would suspect someone like Google could easily 10x the training data overnight if needed.
You are right, I think the next evolution (copyright challenges aside) would be to train a model on all known human history, from the beginning of time. Historic artwork, documents, literature, etc. Then, as we approach the modern age, continue to train the model on every motion picture, modern book, television program, news broadcast, newspaper article, etc. THEN, continue to train the model on YouTube videos, Podcasts, and content created by non-professional humans. Finally, continue to train the model on real-time live data sources like home assistants, public microphones, public cameras, boardroom meeting transcriptions. FINALLY, real-time model enhancement with real-time model interactions e.g. when you interact with the model.
The future will tell how much data can be ingested, and how quickly. Theoretically, if the model can be trained on all of humanity, past and present, it will be a very significant force to recon with across almost any domain. Pair that with rapidly improving android and sensor capabilities, and it's going to be a crazy ride.
| Field | Value |
|---|---|
| text | They have also been training only on a subset of internet data. I would suspect someone like Google could easily 10x the training data overnight if needed. You are right, I think the next evolution (copyright challenges aside) would be to train a model on all known human history, from the beginning of time. Historic artwork, documents, literature, etc. Then, as we approach the modern age, continue to train the model on every motion picture, modern book, television program, news broadcast, newsp… |
| label | r/openai |
| dataType | comment |
| communityName | r/OpenAI |
| datetime | 2024-05-22 |
| username_encoded | Z0FBQUFBQm5Lak1QWEtITUhmdHZDM1RYaFpIb0FnSWlvRVpWSE11ckR0YzVNVmZFRmk3TXBQVE55b3N3Q2xPRUx6UjZSc00ybjNhclk4MDY4TXZzRVdoSnVPdWlrUWYycVE9PQ== |
| url_encoded | Z0FBQUFBQm5Lak9mRGxBbkxsekdjdXdkcHpvYUcyMG5pbHRZWUs2b2Q4U0wwZXYweVgxR3pBMTdmVUZzY0NvUmlJd1pzS2VxcUQxTFNlZmt5UWxOWDJCT1BQaWFHU01Bd1BLNnJPRDVRMFhLT3lkbmhyWFltMWtKeWJKUzhlR1J2eFRSSkx6UGpRMzlKNXBnLU1CNnpFV0ZBSHFOR2J6eUFvSHZVTDRhckJpbnVyQXVUU25RcVJRUEp5UnJLVWxENVJ4T2VEaFVTYnVO |
Raw Record
{
"text": "They have also been training only on a subset of internet data. I would suspect someone like Google could easily 10x the training data overnight if needed.\n\nYou are right, I think the next evolution (copyright challenges aside) would be to train a model on all known human history, from the beginning of time. Historic artwork, documents, literature, etc. Then, as we approach the modern age, continue to train the model on every motion picture, modern book, television program, news broadcast, newspaper article, etc. THEN, continue to train the model on YouTube videos, Podcasts, and content created by non-professional humans. Finally, continue to train the model on real-time live data sources like home assistants, public microphones, public cameras, boardroom meeting transcriptions. FINALLY, real-time model enhancement with real-time model interactions e.g. when you interact with the model.\n\nThe future will tell how much data can be ingested, and how quickly. Theoretically, if the model can be trained on all of humanity, past and present, it will be a very significant force to recon with across almost any domain. Pair that with rapidly improving android and sensor capabilities, and it's going to be a crazy ride.",
"label": "r/openai",
"dataType": "comment",
"communityName": "r/OpenAI",
"datetime": "2024-05-22",
"username_encoded": "Z0FBQUFBQm5Lak1QWEtITUhmdHZDM1RYaFpIb0FnSWlvRVpWSE11ckR0YzVNVmZFRmk3TXBQVE55b3N3Q2xPRUx6UjZSc00ybjNhclk4MDY4TXZzRVdoSnVPdWlrUWYycVE9PQ==",
"url_encoded": "Z0FBQUFBQm5Lak9mRGxBbkxsekdjdXdkcHpvYUcyMG5pbHRZWUs2b2Q4U0wwZXYweVgxR3pBMTdmVUZzY0NvUmlJd1pzS2VxcUQxTFNlZmt5UWxOWDJCT1BQaWFHU01Bd1BLNnJPRDVRMFhLT3lkbmhyWFltMWtKeWJKUzhlR1J2eFRSSkx6UGpRMzlKNXBnLU1CNnpFV0ZBSHFOR2J6eUFvSHZVTDRhckJpbnVyQXVUU25RcVJRUEp5UnJLVWxENVJ4T2VEaFVTYnVO"
}
Entry Information
- Entry ID: 45558
- Repository: Axioma AXP
- Dataset: arrmlet/reddit_dataset_36
- Total Entries: 100,000