14. Call from Macau

2021-02-28 0 Comments

14. Call from Macau

Zen (Haseda Mitsuki)

I really made a stupid mistake.

He should have just turned off the power of the cell phone. Perhaps he might be a phone call from his mother, so he made a mistake that he opened his phone. I don’t know why he couldn’t turn 메가슬롯 off his phone. He lived in his life and regretted it.

If you think about what happened after that, he may not have regretted it. Anyway, if he had not crossed the sea, whether he was Hong Kong or Macau, he might not have taken Seomi back to Hisa.

Anyway, let’s talk about an international call from a terrible situation.

The person who called the phone was Sawa. She was urgent to say that she was a big deal. It was that the fox attempted to commit suicide by drawing wrists at Macau’s hotel.

The situation of the case is as follows.

As I packed my luggage and left my house to leave Korea, Sawaemo suggested that Sawae went to Macau to Macau. It was a so -called double -date trip with Sawae and her boyfriend, and fiancés of the fiancée and the fiancé.

While Sawae and Sawae’s lover and Jasper played in the hotel pool in Macau, she was lying in the room all the time, saying her head hurts for some reason. In the case of her fiancé, her fiancé, Jasper, driven out of her room in the middle of the night, had to remain in their room with a couple in Sawa until late. Sawa went to the room of Sawa, who thought it was something strange late, but after she locked the door of her room.

When the hotel staff, who was summoned with a couple in Sawa, opened and opened the door, the shopping was already bleeding and collapsing. Of course, she was immediately moved to the hospital in an ambulance.

Of course, this was later listened to Sawa, and at the time she received her call, she only heard that she had committed suicide.

So I thought it shouldn’t be, but I hurriedly left. At the end of the row, many unmatched people began to scatter. It was already late at night, and I couldn’t remember the right means of transportation to get out of it for a while, and I heard someone from behind.

“Guest, are you not the Japanese guest that I gave you?”

Fortunately, I was a driver of the taxi I had been on. So, the company’s number that called the taxi was on the phone, so I was trying to call a taxi. Perhaps it remained as a row to play and slowly descended.

“It’s good. It’s been on the way back now, can you ride?”

“Yes. I’ll take you. By the way, the party?

Perhaps by now, Hisa met Sumi. Then he joined with a knife and was hard to persuade the Sumi. It was almost like a child of Sumi’s friend.

When I think of this beauty, her heart hurts strangely. Her heart was stiff.

Like her sword, she also loved Lee. But I didn’t think she could be mine for some reason. When she meets a knife, she feels strong, and she must be mine, contrary to her conviction.

Maybe Hisa felt that way.

She knew that Hisa liked the sword very much, but she left alone. It was because they were intuitive that they would not go beyond the line. Anyway, when I thought I could not do anything in that situation, I turned my head and answered a taxi driver.

“I leave alone because of the situation. Please hurry up.”

In a taxi heading to Andong Intercity Bus Terminal, I asked the taxi driver as if I had remembered.

“Obviously the Japanese are, you speak Korean well?”

“Mother is Korean. It’s a mixed race.”

“Ah, you did. If you are Korean, you are Korean. I know it’s excuse, but are you a man or a woman?

I often ask these questions. And in this case, I usually saw the situation and then answered to the advantage of it or to answer it. But today I couldn’t choose one side. After thinking for a while, I answered.

“I was originally a man, I had a transsexual surgery.”

In fact, it is the opposite. Although he was born as a woman’s body, while spending a long time in Hong Kong, his senses as a woman were completely missing.

“Ah, you did. It’s okay. I’ve been to Thailand before. There’s a lot of friends. Haha. Are you going to go back?

At any time, usually, I just answer so. However, I didn’t hate the middle -aged male drivers who had many middle -aged male drivers. One day, with the idea that it might come back.

“I really love someone.”

I felt the sign of the driver’s ears.

“I came here to find him, but after meeting, it’s just going to go back.”

“Can I ask you what you mean today?”

“It means it’s difficult to take her to her Macau today.”

A sigh -like sound came out of the mouth of a taxi driver.

“Macau, guests are really great. You’re not normal. You are a highly identity person or do you do something very difficult?”

“What do you say.”

“By the way, it’s strange. The guest is a man, but he wants to be a woman, so he did a transsexual surgery.

I couldn’t answer the question.

It didn’t take much time to cross the sea and enter Macau. It was even more so because I took a direct line to Macau without passing through Hong Kong.

After breaking up with the knife, the body and mind were left alone to rich in this way. Now I wanted to finish this long trip, but I didn’t know when it would end.

Not a luxurious hotel where the group stayed in Sawa, but I lived at a traveler hotel near the hospital where the Yeoul was hospitalized. Sawa ran quickly and told me about his own paper. At the end she sighed and said she said.

“Jen, why did I call you, what he suicidal was because of you.”

“What am I doing?”

“Do not say that way. Anyway, the streams of the bitch. I married Jasper and shouted to break up with you, but after that, it was painful. To be honest, I have to worry about Zesper than now. I said it was because of Jen because I couldn’t blame myself that it was because of something wrong.

“I thought it would be better to talk to you for a while before I went to the stream, but I heard it, so I would not want to go to Yeoul now.”

“No, go. Go. I’m waiting for you. I’m well Tyler and returned to Hong Kong. Fortunately, I don’t want to break up with the strain. Soothing. For now, the only person who can soothe the strain is that you are the only one. ”

0. Prevously on ..

Last time, I finally found a link between dynamic planning and reinforcement learning. When the model is known, it can be solved by dynamic planning method (value iteration, policy iteration). At this time, rather than estimating the model, you can use a sample that can interact with a policy that can obtain the maximum reward. There are two directions for reinforcement learning. The value based RL based on the value, and the reinforcement learning using the policy themselves to save, and the advantages and types of both are informed by “Learning Enhancement Learning”.

1. Monte Carlo Simulation (Method)

Let’s take a look at the simplest and easiest MC, Monte Carlo, the simplest and easiest way to understand. Monte Carlo is one of Monaco’s administrative districts. Like casinos, it is named after the name of the city, which is famous for its random satellite. It may have been Gangwon Method in Korea. In any case, the Monte Carlo method is famous not only for reinforcement, but also when it is difficult to know any model. It is also famous for using the MC method to study the characteristics of the neutron. For example, if you want to know the PI value using the width of the four -member, you will continue to throw darts in the 1*1 square. If you assume that the right place is random, the darts in the total darts are the relationship between 1 and the four -part PI. The MC method is also called MC Simulation, which is the process of estimating the value through simulation.

If you simulate Monte Carlo, you will have a problem. If the sample is not spread in two dimensions and is spreading at a high level, there is a problem. And if you throw the dart on the target, the wind is blowing and stuck in the wrong places. There will be a problem to simply analyze the results. And if the number of samples is small, can the results can be created through the results? The Monte Carlo method has a lot of samples, and it can be applied at low levels and we need to know the probability distribution of sampling.


INCREMENTAL AVERAGE. This is a value that can indicate the relationship between the average of N-1 and the average of N, like a ignition when extracting the average from a given data. In general, if 1/n is attached, the arithmetic average, alpha n is used to be generalized.

The Data N average is the Data N-1 average, and the alpha n multiplying the new sample is the value minus the previous N-1 average. This can be said to consider how much of the impact of the new data when calculating the new average. In English, it is called “Surprise of New Data.” Perhaps the old -fashioned god, like a lion, accepts the previous one and applies new things.

3. Robbins-Monro Condition on Step Size

Step Size Alpha N must meet two conditions. The infinite water of alpha N must emit, and the infinite water of the square of Alpha N must be converged. This means that the first condition is that the infinite water of the alpha N must be shipped so that it should be prevented too quickly, and the infinite water of the square of the alpha n must converge to converge after searching (reflecting). This is called Robinson Mono condition and actually applies to various fields as well as reinforcement learning. If the Step Size is a constant, the infinite water of the square of the alpha N is exuded, and the RM condition will be violated. . In the future, the contents of the reinforcement learning are repeated here. At first there was a dilemma of exploitation and exploration. Even if you converge too quickly, you should not explore too much, but this trade-off is overall in the field of reinforcement learning.

4. MONTE CARLO Policy Evaluation (Prediction)

Now that you need to know what you need to know, let’s find out how Montecarlo, the first method of Value Based RL. From the situation where you need to learn, you usually know State Transition Probability and Reward Model. So we can’t use DP. Through the Sample Trajectory, we will learn about the value of the state, use it to determine the policy, and again save the sample traejectory.

The reward value of the termination state is 0 (to set the reference), and the reward receives every time you cross the state is calculated to calculate the TOTAL DISCOUNTED REWARD. This return value changes for each sample trajectory. State value is the expected value of return when the state is S.

The algorithm below shows how the MC method works more intuitively. At this time, the term Every Visit, First Visit will appear, but it will be explained later. First, all state values ​​are initialized to 0 and create sample trajectory. This is called episode. The episode is obtained, the resulting the return value is calculated, and this return value is an incremental Average (“Surprise of New Data”).

The first visits, Every Visit, which are mentioned in advance are simple. You can pass the same state several times in one Episode. At this time, whether you update the value every time you pass the state, or if you pass it once, the car will not be updated next time. Every Visit looks more intuitive, but both are fine. Depending on the problem situation, it can be applied differently, and if someone applies the same problem as every visit and the other to First Visit, the value would be different.

And as an additional mention, the MC creates Episode and only updates the state that belongs to it, so it is not possible to say that update occurs for all states. And I try to explain the most important properties of MC. Bias, variance. The bias is biased, and the variance is how much it is from the average.

The value value of the MC method is not Bias because it is update using a return value. And the MC, which sets to return with one Episode, has a large variance because it is highly dependent in 룰렛 Episode.


Policy PI is required for control.

In the DP, the Q value can be calculated using Model from the V value, but if there is no Model, the Q value must be obtained using Episode.

The advantage of MC Control is easy and easy to apply parallel. And it’s powerful because you don’t need to know full models (learned by interactions for Environment without knowing Model) and FULL STATES. And because Bellman Equation is not used, it can be applied to the MEMORYLESS Property, the MDP properties.

Note 1. Glie Policy

MC Control is carried out through a Policy that explores infinitely and takes the best.

Note 2. Disadvantages of MC Method

The disadvantage of MC method is High Variance. Despite the advantages of high variance, it is virtually impractical. This impractical side is Motivation for TD Methods, the next episode.