Row 5505

Row ID: 5505 | Dataset Entry | Axioma AXP Content Repository

Content Data

This page contains data entry 5505 from the Axioma AXP content repository. The structured data below represents the complete record for this entry.

Hi. I'm trying to do ASR on Librispeech data and have a confusion about the data processing.

1. read .flac files from librispeech 2. normalise audio (x - x.mean()/x.std()) 3. convert to melspectogram (n_mels = 128) 4. convert to db scale (top_db = 80) 5. normalise spectogram (how? using torchvision? does someone know mean/std? I calculated it for the dataset it comes up to be: mean,std = -0.2, 1.1. This doesn't seem right? can someone verify please)

And how do I normalise this spectogram? using torchvision.Normalise()?

Not many resources online about ASR and ones that are there are conflicting in what they do.

Field	Value
text	Hi. I'm trying to do ASR on Librispeech data and have a confusion about the data processing. 1. read .flac files from librispeech 2. normalise audio (x - x.mean()/x.std()) 3. convert to melspectogram (n_mels = 128) 4. convert to db scale (top_db = 80) 5. normalise spectogram (how? using torchvision? does someone know mean/std? I calculated it for the dataset it comes up to be: mean,std = -0.2, 1.1. This doesn't seem right? can someone verify please) And how do I normalise this spectogram? usin…
label	r/pytorch
dataType	post
communityName	r/pytorch
datetime	2024-04-30
username_encoded	Z0FBQUFBQm5LakwyY0F6R1Vkd1d0bjlrX1NXQ0htdDhzcHpYQ1NhTUlhUkJ5MUdTa21sTThSejkwZkI2aThmWFFyYzh1SVVpWDVNTlYzXzJGVkhaNlRGck5BUWxNbXhDNEE9PQ==
url_encoded	Z0FBQUFBQm5Lak9Gd1pLck56c0R5UUJKclhrdmhBR2hScDVNendxcUVvX2ZGZVJWaTdzQTlEbWpDTkMyS1J4czN5Rm1zVjlpaDVkRTQxSHlwaGpMd0dQbUQ4b2pGeTlwQ1VIQmFFR1FQel9CU1M2NXRzTE90VUdaU2J1WDczcWtuSC1qS2x4ZEFhNXFsdDdjdGdfTUpONXBfdU4ybXFyelBuSjYxZDFsZUVhMTBOdG1GTktPRHJhWmU1VWdYNE5iRWJibmxJdG54RFF4ZHo4S2kyOTA5QWRMRkVLampVaDRPUT09

Raw Record

{
  "text": "Hi. I'm trying to do ASR on Librispeech data and have a confusion about the data processing.\n\n1. read .flac files from librispeech\n2. normalise audio (x - x.mean()/x.std())\n3. convert to melspectogram (n_mels = 128)\n4. convert to db scale (top_db = 80)\n5. normalise spectogram (how? using torchvision? does someone know mean/std? I calculated it for the dataset it comes up to be: mean,std = -0.2, 1.1. This doesn't seem right? can someone verify please)\n\nAnd how do I normalise this spectogram? using torchvision.Normalise()?\n\nNot many resources online about ASR and ones that are there are conflicting in what they do.\n",
  "label": "r/pytorch",
  "dataType": "post",
  "communityName": "r/pytorch",
  "datetime": "2024-04-30",
  "username_encoded": "Z0FBQUFBQm5LakwyY0F6R1Vkd1d0bjlrX1NXQ0htdDhzcHpYQ1NhTUlhUkJ5MUdTa21sTThSejkwZkI2aThmWFFyYzh1SVVpWDVNTlYzXzJGVkhaNlRGck5BUWxNbXhDNEE9PQ==",
  "url_encoded": "Z0FBQUFBQm5Lak9Gd1pLck56c0R5UUJKclhrdmhBR2hScDVNendxcUVvX2ZGZVJWaTdzQTlEbWpDTkMyS1J4czN5Rm1zVjlpaDVkRTQxSHlwaGpMd0dQbUQ4b2pGeTlwQ1VIQmFFR1FQel9CU1M2NXRzTE90VUdaU2J1WDczcWtuSC1qS2x4ZEFhNXFsdDdjdGdfTUpONXBfdU4ybXFyelBuSjYxZDFsZUVhMTBOdG1GTktPRHJhWmU1VWdYNE5iRWJibmxJdG54RFF4ZHo4S2kyOTA5QWRMRkVLampVaDRPUT09"
}

Explore Dataset Explore Row

Entry Information

Entry ID: 5505
Repository: Axioma AXP
Dataset: arrmlet/reddit_dataset_36
Total Entries: 100,000