← Back to Directory

Row Details #5505

Data:

{
  "text": "Hi. I'm trying to do ASR on Librispeech data and have a confusion about the data processing.\n\n1. read .flac files from librispeech\n2. normalise audio (x - x.mean()/x.std())\n3. convert to melspectogram (n_mels = 128)\n4. convert to db scale (top_db = 80)\n5. normalise spectogram (how? using torchvision? does someone know mean/std? I calculated it for the dataset it comes up to be: mean,std = -0.2, 1.1. This doesn't seem right? can someone verify please)\n\nAnd how do I normalise this spectogram? using torchvision.Normalise()?\n\nNot many resources online about ASR and ones that are there are conflicting in what they do.\n",
  "label": "r/pytorch",
  "dataType": "post",
  "communityName": "r/pytorch",
  "datetime": "2024-04-30",
  "username_encoded": "Z0FBQUFBQm5LakwyY0F6R1Vkd1d0bjlrX1NXQ0htdDhzcHpYQ1NhTUlhUkJ5MUdTa21sTThSejkwZkI2aThmWFFyYzh1SVVpWDVNTlYzXzJGVkhaNlRGck5BUWxNbXhDNEE9PQ==",
  "url_encoded": "Z0FBQUFBQm5Lak9Gd1pLck56c0R5UUJKclhrdmhBR2hScDVNendxcUVvX2ZGZVJWaTdzQTlEbWpDTkMyS1J4czN5Rm1zVjlpaDVkRTQxSHlwaGpMd0dQbUQ4b2pGeTlwQ1VIQmFFR1FQel9CU1M2NXRzTE90VUdaU2J1WDczcWtuSC1qS2x4ZEFhNXFsdDdjdGdfTUpONXBfdU4ybXFyelBuSjYxZDFsZUVhMTBOdG1GTktPRHJhWmU1VWdYNE5iRWJibmxJdG54RFF4ZHo4S2kyOTA5QWRMRkVLampVaDRPUT09"
}