Row 45558

Row ID: 45558 | Dataset Entry | Axioma AXP Content Repository

Content Data

This page contains data entry 45558 from the Axioma AXP content repository. The structured data below represents the complete record for this entry.

They have also been training only on a subset of internet data. I would suspect someone like Google could easily 10x the training data overnight if needed.

You are right, I think the next evolution (copyright challenges aside) would be to train a model on all known human history, from the beginning of time. Historic artwork, documents, literature, etc. Then, as we approach the modern age, continue to train the model on every motion picture, modern book, television program, news broadcast, newspaper article, etc. THEN, continue to train the model on YouTube videos, Podcasts, and content created by non-professional humans. Finally, continue to train the model on real-time live data sources like home assistants, public microphones, public cameras, boardroom meeting transcriptions. FINALLY, real-time model enhancement with real-time model interactions e.g. when you interact with the model.

The future will tell how much data can be ingested, and how quickly. Theoretically, if the model can be trained on all of humanity, past and present, it will be a very significant force to recon with across almost any domain. Pair that with rapidly improving android and sensor capabilities, and it's going to be a crazy ride.

Field	Value
text	They have also been training only on a subset of internet data. I would suspect someone like Google could easily 10x the training data overnight if needed. You are right, I think the next evolution (copyright challenges aside) would be to train a model on all known human history, from the beginning of time. Historic artwork, documents, literature, etc. Then, as we approach the modern age, continue to train the model on every motion picture, modern book, television program, news broadcast, newsp…
label	r/openai
dataType	comment
communityName	r/OpenAI
datetime	2024-05-22
username_encoded	Z0FBQUFBQm5Lak1QWEtITUhmdHZDM1RYaFpIb0FnSWlvRVpWSE11ckR0YzVNVmZFRmk3TXBQVE55b3N3Q2xPRUx6UjZSc00ybjNhclk4MDY4TXZzRVdoSnVPdWlrUWYycVE9PQ==
url_encoded	Z0FBQUFBQm5Lak9mRGxBbkxsekdjdXdkcHpvYUcyMG5pbHRZWUs2b2Q4U0wwZXYweVgxR3pBMTdmVUZzY0NvUmlJd1pzS2VxcUQxTFNlZmt5UWxOWDJCT1BQaWFHU01Bd1BLNnJPRDVRMFhLT3lkbmhyWFltMWtKeWJKUzhlR1J2eFRSSkx6UGpRMzlKNXBnLU1CNnpFV0ZBSHFOR2J6eUFvSHZVTDRhckJpbnVyQXVUU25RcVJRUEp5UnJLVWxENVJ4T2VEaFVTYnVO

Raw Record

{
  "text": "They have also been training only on a subset of internet data. I would suspect someone like Google could easily 10x the training data overnight if needed.\n\nYou are right, I think the next evolution (copyright challenges aside) would be to train a model on all known human history, from the beginning of time. Historic artwork, documents, literature, etc. Then, as we approach the modern age, continue to train the model on every motion picture, modern book, television program, news broadcast, newspaper article, etc.  THEN, continue to train the model on YouTube videos, Podcasts, and content created by non-professional humans. Finally, continue to train the model on real-time live data sources like home assistants, public microphones, public cameras, boardroom meeting transcriptions. FINALLY, real-time model enhancement with real-time model interactions e.g. when you interact with the model.\n\nThe future will tell how much data can be ingested, and how quickly. Theoretically, if the model can be trained on all of humanity, past and present, it will be a very significant force to recon with across almost any domain. Pair that with rapidly improving android and sensor capabilities, and it's going to be a crazy ride.",
  "label": "r/openai",
  "dataType": "comment",
  "communityName": "r/OpenAI",
  "datetime": "2024-05-22",
  "username_encoded": "Z0FBQUFBQm5Lak1QWEtITUhmdHZDM1RYaFpIb0FnSWlvRVpWSE11ckR0YzVNVmZFRmk3TXBQVE55b3N3Q2xPRUx6UjZSc00ybjNhclk4MDY4TXZzRVdoSnVPdWlrUWYycVE9PQ==",
  "url_encoded": "Z0FBQUFBQm5Lak9mRGxBbkxsekdjdXdkcHpvYUcyMG5pbHRZWUs2b2Q4U0wwZXYweVgxR3pBMTdmVUZzY0NvUmlJd1pzS2VxcUQxTFNlZmt5UWxOWDJCT1BQaWFHU01Bd1BLNnJPRDVRMFhLT3lkbmhyWFltMWtKeWJKUzhlR1J2eFRSSkx6UGpRMzlKNXBnLU1CNnpFV0ZBSHFOR2J6eUFvSHZVTDRhckJpbnVyQXVUU25RcVJRUEp5UnJLVWxENVJ4T2VEaFVTYnVO"
}

Explore Dataset Explore Row

Entry Information

Entry ID: 45558
Repository: Axioma AXP
Dataset: arrmlet/reddit_dataset_36
Total Entries: 100,000