Oasist Blog

This blog features Linguistics, Engineering&Programming and Life Career.

Automation of Botpress Accuracy Inspection Vol.1 - CSV → JSON Convertor -

f:id:oasist:20201024112839p:plain
Botpress

Contents

1. Background

I am working on inspection of Botpress response accuracy.
I had some manual labours and was not happy with them, so I made the following tasks automated.

  • Genrate JSON file for Q&A import: Mentioned in this article
  • Generate a matrix chart of confidence by via API with test data: Mentioned in the next article

Please refer to source codes and README in my GitHub repository for how to build Botpress in your local environment(→ 5. Source Code).
I will explain the way we generate JSON file for Q&A import in Botpress.

2. Deliverables

It is require to convert CSV of Q&A learning data to JSON and export it as a file Botpress can import.
Here is the structure of JSON.

{
    "qnas": [
        {
            "id": "{Serial_Num}",
            "data": {
                "action": "text",
                "contexts": [
                    "{context}"
                ],
                "enabled": true,
                "answers": {
                    "ja": [
                        "{Answers}"
                    ]
                },
                "questions": {
                    "ja": [
                        "{Questions1}",
                        "{Questions2}",
                        "{Questions3}",
                        "{Questions4}",
                        ...
                    ]
                },
                "redirectFlow": "",
                "redirectNode": ""
            }
        }
    ]
}

At least 3 Questions are required per Q&A to gurantee renponse confidence.

3. Implementation

3-1. Read CSV

CSV to import has the following format.

  • Place headers Serial_Nums, Questions and Answers
  • Remove HTML tags and quotes to avoid an error raised by CSV library of Python
  • Duplicate Serial_Nums and Answers(Serial_Nums & Answers: Questions = 1:N ).
  • For the details, check the Sample

This CSV will be read line by line as Array or List.

  • In Python, I designated keys with an index like qna[0].
# Sample
with open(csv_path) as f:
    reader = csv.reader(f)
    next(reader)
    for qna in reader:
        print(qna[0])
        print(qna[1])
        print(qna[2])
  • In Ruby, I explicitly designated keys with headers option.
# Sample
CSV.foreach(csv_path, headers: true) do |qna|
  p qna['Serial_Nums']
  p qna['Questions']
  p qna['Answers']
end

3-2. Generate Q&A Array or List

First, implement the method or function to initialise Dict and Hash for Botpress format.

def gen_dict_template():
    return {
      "id": "",
      "data": {
        "action": "text",
        "contexts": [
          "hoge"
        ],
        "enabled": True,
        "answers": {
          "ja": ["hoge"]
        },
        "questions": {
          "ja": []
        },
        "redirectFlow": "",
        "redirectNode": ""
      }
    }
def gen_hash_template
  {
    id: '',
    data: {
      action: 'text',
      contexts: [
        'hoge'
      ],
      enabled: true,
      answers: {
        ja: []
      },
      questions: {
        ja: []
      },
      'redirectFlow': '',
      'redirectNode': ''
    }
  }
end

Second, prepare for an empty Array or List to include Q&A Dict and Hash.
Third, assign Serial_Nums, Questions and Answers to corresponding keys in Q&A Dict and Hash.
Answers are duplicate in CSV and their type is Array or List, so the conditional statement must be as follows.

  • Previous Answers element is the same as the one loaded now
    • Questions element is added to Questions in Q&A Dict and Hash
  • Previous Answers element is NOT the same as the one loaded now
    • Serial_Nums is assigned to the one in Q&A Dict and Hash, when Questions and Answers are added to the corresponding ones in Q&A Dict and Hash

This process is handled line by line, removes duplicate Questions and Q&A Dict and Hash, add them to the empty Array or List and return it.

def gen_qnas_list(csv_path):
    qnas = []
    dict = gen_dict_template()
    with open(csv_path) as f:
        reader = csv.reader(f)
        next(reader)
        for qna in reader:
            if dict["data"]["answers"]["ja"][-1] == qna[2]:
                dict["data"]["questions"]["ja"].append(qna[1])
            else:
                dict = gen_dict_template()
                dict["id"] = qna[0]
                dict["data"]["questions"]["ja"].append(qna[1])
                dict["data"]["answers"]["ja"].remove("hoge")
                dict["data"]["answers"]["ja"].append(qna[2])
            dict["data"]["questions"]["ja"] = uniq_list(dict["data"]["questions"]["ja"])
            qnas.append(dict)
    return uniq_list(qnas)
def gen_qnas_arr(csv_path)
  qnas = []
  hash = gen_hash_template
  CSV.foreach(csv_path, headers: true) do |qna|
    if hash[:data][:answers][:ja].last == qna['Answers']
      hash[:data][:questions][:ja] << qna['Questions']
    else
      hash = gen_hash_template
      hash[:id] = qna['Serial_Nums']
      hash[:data][:questions][:ja] << qna['Questions']
      hash[:data][:answers][:ja] << qna['Answers']
    end
    hash[:data][:questions][:ja].uniq!
    qnas << hash
  end
  qnas.uniq
end

3-3. Convert Dict and Hash to JSON

Assign Q&A Array or List to the variable as the value of qnas key in the Dict and Hash.

def csv_to_dict(self):
    self.obj = { "qnas": gen_qnas_list(self.csv_path) }
    return self.obj["qnas"]
def csv_to_hash
  @obj = { qnas: gen_qnas_arr(csv_path) }
end

Convert Dict and Hash to JSON, assign a path and export a JSON file.

def dict_to_json(self):
    write_json(self.json_path, self.obj)
def hash_to_json
  write_json(json_path, @obj)
end

4. Conclusion

Botress uses Ngram instead of morphological analysis for feature extraction and it need tuning, so a bunch of tools have to be Python.
That is why I converted Ruby scripts to Python ones.

I worked on Python coding for the first time in months, so I somewhat forgot how.
It does not provide as intuitive method or fuction calling as that of Ruby, and it was painstaking to import liburaries even for basic ones.
What was more challenging was Python raised exceptions to some method or fuction calling Ruby treats as nil.
For example, Ruby returns nil when point a non-existent index, whereas Python raises IndexError, which stops the procedure.
That is why I had to prepare for the different initial value of Questions in the format of Q&A Dict and Hash to avoid procedure halts.

I would not like you to take me wrong.
Python is so wonderful that it is rich in liburaries Ruby does not have, which I was surprised at when I made a NLP application.
As to deep leaning or Natural Language Processing, Python will be the only one choice.

As an assignment, I must write Python more and more and I will.

5. Source Code