Quantcast
Channel: RapidMiner Studio Forum topics
Viewing all 2122 articles
Browse latest View live

How to properly connect to mongo atlas (cloud) ?

$
0
0

Hi there, I am currently trying to get a mongo in the cloud connection, and while I am able to connect to the admin database I am not able to connect to other db instances. I can connect to these using a dedicated mongo tool (like robo 3T) but not using Rapidminer. So either my settings in RM or in Atlas require something special to use any but the admin instance db

Anyone ever tried something similar? Not sure if the problem is RM related or Atlas related but there is hardly any documentation to find.


how read arff file in weka

$
0
0

Hello. I received data from the data tweeter and with the operator
write arff
I want to convert arff format
But it does not open after the conversion in the weka program. There is an error. I turned, but it did not work again.
Does anyone know the cause?aa1.JPGaa2.JPG

Time Series questions about window size,step size and horizon

$
0
0

I have been learning to solve time series questions in Rapidminer, already read several replys before and the Financial Market Model class on the blog website, got those basic template. still confused on how to get the right value on  window size,step size and horizon. 

 

Here are the questions, really appreciate a lot for some tips on it !

1. Is it better that just setting the window size equal to length of cyle? like 7 for data in week period, 12 for data in year period.

2.How to  set the training and test window width in Sliding Validatoin? still equal to the length of period?

3.How to use RMSE to evaluate performance in Rapidminer.

4.If my origin example set contains both nominal and numeric data, should I do any transformation ? which model could work better for that circumstance,SVM?ANN?Polynominal?

 

Great Thanks!

 

Help With Loop Collection

$
0
0

Im new at this, so please excuse my ignorance. My goal is to train a model on a data set of independent events, I'm using a group into collection and loop collection. Breaking after GroupInto - it looks great. Data is split into N example sets. Im not sure of what to put inside the loop collection: Currently i have: SplitData + LinearRegression+ApplyModel+Preformance. I am using the Append operator after LoopCollection, but Im not returning an EXA from LoopCollection. Any guidance would be greatly appreciated.

Thanks

how to connect RNN layer to another RNN layer in keras

$
0
0

Hi everyone,

i have been facing this trouble for quite some time and i really hope that i get some reply on this. Is this due to the problem of the keras installation or just that i did not know the proper way of putting them together? thank you guys

Value error: Error when checking input

$
0
0

Hi, i tried to construct the architecture as shown as below.
I have 9000 datasets consisting four inputs and one output. I split the data into ratio 8 to 1. When I try to run the process, it showes the error: Execution of Python script failed. 
ValueError: Error when checking the input: expected conv1d_1_input to have 3 dimensions, but got array with shape (7200,4)(script, line 295).
Is it mean that i should put my input shape as (7200,4,None)? Or there are any method to solve the proble?  Where can we check the script? Your advice is highly appreciated. Thank you.

Value error.PNG

How use rosette text analysis

$
0
0
Hello. I want to remove the prepositions. For example, the word veeeryy, the word very was. I searched for the rosette text analysis operator. But I do not know which operator to use for this purpose. Someone guides me?

visualisation: multiple rows in one chart

$
0
0

I have 3 columns:

- item_id (4000 items)

- location (5 distinct locations alltogether)

- date (35 distinct dates alltogether)

I would like to draw a chart with

- x axis: date

- y axis location 

and put the location and date for every single item in one chart and it should be a line chart for each item. Therefore I can see in the chart how the locations changed over time for a lot of items.

If there is a limit in the number of items, that is fine, I tried with custom charts but did not succeed.

Thanks for your help.

 

 

 


Association rules, lift parameter

$
0
0

Hey, I have a process that extract association rules. I want to know how the lift parameter is interpreted?

According to the rapidminer tutorial:

"Values close to 1 imply that X and Y are independent and the rule is not interesting. "

So, are all generated rules in my process worthless as lift is close to 1? Which of them are interesting?

 

Client Ideas/ Suggestion Sorting to Merge

$
0
0
 

I was hoping to get a little guidance to build a process that will help me organize data for a work that I have been losing sleep over...  Here is the problem and what I am trying to accomplish: I work for a construction technology company that uses Uservoice as a means to gather client suggestions and ideas that are then voted on by other clients of ours. Unfortunately, many tickets are being created that are different user posts but same content as existing tickets; essentially the same idea but worded a little differently. Due to this, we have multiple like-tickets with client users voting on both. So my responsibility, under a time crunch, is to sort the tickets to be grouped together by tickets asking similar questions. Then I can convert to excel and have all my similar tickets together to easily copy and paste into Uservoice to find then bulk merge into one ticket in our account... This way all votes are combined and we get a true picture of what our client needs are so Resources for developing can then be allocated appropriately. 

Please find the attached sample of what I am trying to accomplish, for those visual folks out there. You will notice that the first tab is how I receive them and the second tab is how, in a perfect world, would be sorted by the rapid miner process. Text color is only to show the various different client request with votes on it from other users...  Currently, I have about 3000 tickets that need to be sorted by the end of Q2. This has caused a lot of stress and lost sleep. You have no idea how helpful this would be!! 

Please let me know if you can help! Any process suggestions or extension recommendations to accomplish this would be really really appreciated!  Look forward to hearing back.

Thanks! 

Execution of python script failed: Value error

$
0
0

I trie to run my data using the architecture build-in example (s&p-500-regression). In Keras Model interface, I changed the input shape to (4,) because I have four input, I add a core layer [Reshape, (1,4)]
I get an error Execution of Python script failed. Please check your Python script: VaueError: total size of new array must be unchanged.

customer retention through rapid miner

$
0
0

I have attached the data set .

 

 

A telecommunications company recently discovered that some of their customers are leaving their landline business for cable competitors, and they are concerned about this loss. You are an analyst at this company and are tasked to understand who is leaving and why and also to develop new customer retention programs. Based on your analysis of the data, what recommendations do you have for the telecommunications company?

This information will be used to design new customer retention programs for senior leaders at the telecommunications company. You will be required to (1) show and interpret your analysis of the data and (2) use your analysis to describe the new programs you will be proposing.

rovide a written analytical report that walks Senior Leaders through your analysis of customer behavior and the new customer retention programs that you will be proposing. In other words, your Analytic Plan should include:

  1. Documentation of data analysis and results

    1. Provide the working process streams demonstrating how the data are used,

      what algorithms are used, and the ultimate output

    2. Interpretation of all outputs and how these outputs help in your design of

      the new programs

  2. New programs you are proposing

    1. The value it will provide to the customer

    2. How the data will be used in creating and implementing the programs

    3. Inputs required from the customer

    4. How the customer will interact with the programs

    5. Visual representations of your new programs including diagrams showing process flow and/or mock-ups of interfaces

Text Mining/ Data Engineer Services Needed.

$
0
0

Hi all! 

I have been trying to teach myself how to text mine for a work project, and after two days of struggling to accomplish what I intended on, I am throwing in the towel and seeking a professional! Due to time constraints, I can't really waste any more time trying to learn how to create the model that will work. If any expert level Rapid miners or data engineers are interested, please shoot me a message to chat project needs and any fee it would cost. Appreciate it! 

 

Thanks! 

Kevin

Rapidminer on HPUX and Solaris/Sparc?

$
0
0

Hi All, 

I need help on installing rapidminer on HPUX and Solaris/Sparc. Any clue?

 

Thanks

AUPRC with imbalanced classes

$
0
0

Hi, it seems I am not getting expected results when using Performance (AUPRC) with highly imbalanced dataset.

 

The relationship between recall and precision of positive class seems pretty intuitive, but I still get AUPRC = 0.010 regardless of anything: 

 

Screenshot 2018-04-25 23.28.32.pngScreenshot 2018-04-25 23.28.14.png

I am using here imbalanced credit card fraud dataset.

 

At the same time when I artificially balance data, AUPRC shows expected 'normal' values:

 

Screenshot 2018-04-25 23.35.06.pngScreenshot 2018-04-25 23.34.59.png

Process attached:

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003"><context><input/><output/><macros/></context><operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process"><process expanded="true"><operator activated="true" class="retrieve" compatibility="8.1.003" expanded="true" height="68" name="Retrieve creditcard" width="90" x="45" y="34"><parameter key="repository_entry" value="../data/creditcard"/></operator><operator activated="true" class="sample" compatibility="8.1.003" expanded="true" height="82" name="equalize classes" width="90" x="179" y="34"><parameter key="balance_data" value="true"/><list key="sample_size_per_class"><parameter key="1" value="492"/><parameter key="0" value="492"/></list><list key="sample_ratio_per_class"/><list key="sample_probability_per_class"/></operator><operator activated="false" class="sample_stratified" compatibility="8.1.003" expanded="true" height="82" name="sample 50k" width="90" x="45" y="340"><parameter key="sample_size" value="50000"/></operator><operator activated="false" class="create_threshold" compatibility="8.1.003" expanded="true" height="68" name="Create Threshold" width="90" x="581" y="391"><parameter key="threshold" value="0.09"/><parameter key="first_class" value="0"/><parameter key="second_class" value="1"/></operator><operator activated="true" class="split_data" compatibility="8.1.003" expanded="true" height="103" name="Split Data" width="90" x="246" y="136"><enumeration key="partitions"><parameter key="ratio" value="0.8"/><parameter key="ratio" value="0.2"/></enumeration><parameter key="sampling_type" value="stratified sampling"/></operator><operator activated="true" class="concurrency:cross_validation" compatibility="8.1.003" expanded="true" height="145" name="Validation" width="90" x="380" y="34"><parameter key="sampling_type" value="shuffled sampling"/><process expanded="true"><operator activated="false" class="concurrency:parallel_decision_tree" compatibility="8.1.003" expanded="true" height="103" name="Decision Tree" width="90" x="112" y="136"><parameter key="apply_pruning" value="false"/><parameter key="apply_prepruning" value="false"/></operator><operator activated="true" class="h2o:generalized_linear_model" compatibility="7.2.000" expanded="true" height="124" name="Generalized Linear Model" width="90" x="246" y="34"><list key="beta_constraints"/><list key="expert_parameters"/></operator><operator activated="false" class="h2o:deep_learning" compatibility="7.6.001" expanded="true" height="82" name="Deep Learning" width="90" x="380" y="136"><enumeration key="hidden_layer_sizes"><parameter key="hidden_layer_sizes" value="50"/><parameter key="hidden_layer_sizes" value="50"/></enumeration><enumeration key="hidden_dropout_ratios"/><list key="expert_parameters"/><list key="expert_parameters_"/></operator><operator activated="false" class="stacking" compatibility="8.1.003" expanded="true" height="68" name="Stacking" width="90" x="179" y="289"><process expanded="true"><operator activated="true" class="h2o:generalized_linear_model" compatibility="7.2.000" expanded="true" height="124" name="Generalized Linear Model (2)" width="90" x="179" y="187"><list key="beta_constraints"/><list key="expert_parameters"/></operator><operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.003" expanded="true" height="103" name="Decision Tree (2)" width="90" x="112" y="34"><parameter key="apply_pruning" value="false"/><parameter key="apply_prepruning" value="false"/></operator><operator activated="true" class="h2o:deep_learning" compatibility="7.6.001" expanded="true" height="82" name="Deep Learning (2)" width="90" x="112" y="340"><enumeration key="hidden_layer_sizes"><parameter key="hidden_layer_sizes" value="20"/><parameter key="hidden_layer_sizes" value="20"/></enumeration><enumeration key="hidden_dropout_ratios"/><list key="expert_parameters"/><list key="expert_parameters_"/></operator><connect from_port="training set 1" to_op="Decision Tree (2)" to_port="training set"/><connect from_port="training set 2" to_op="Generalized Linear Model (2)" to_port="training set"/><connect from_port="training set 3" to_op="Deep Learning (2)" to_port="training set"/><connect from_op="Generalized Linear Model (2)" from_port="model" to_port="base model 2"/><connect from_op="Decision Tree (2)" from_port="model" to_port="base model 1"/><connect from_op="Deep Learning (2)" from_port="model" to_port="base model 3"/><portSpacing port="source_training set 1" spacing="0"/><portSpacing port="source_training set 2" spacing="0"/><portSpacing port="source_training set 3" spacing="0"/><portSpacing port="source_training set 4" spacing="0"/><portSpacing port="sink_base model 1" spacing="0"/><portSpacing port="sink_base model 2" spacing="0"/><portSpacing port="sink_base model 3" spacing="0"/><portSpacing port="sink_base model 4" spacing="0"/></process><process expanded="true"><operator activated="true" class="h2o:generalized_linear_model" compatibility="7.6.001" expanded="true" height="124" name="Generalized Linear Model (3)" width="90" x="45" y="34"><list key="beta_constraints"/><list key="expert_parameters"/></operator><connect from_port="stacking examples" to_op="Generalized Linear Model (3)" to_port="training set"/><connect from_op="Generalized Linear Model (3)" from_port="model" to_port="stacking model"/><portSpacing port="source_stacking examples" spacing="0"/><portSpacing port="sink_stacking model" spacing="0"/></process></operator><connect from_port="training set" to_op="Generalized Linear Model" to_port="training set"/><connect from_op="Generalized Linear Model" from_port="model" to_port="model"/><portSpacing port="source_training set" spacing="0"/><portSpacing port="sink_model" spacing="0"/><portSpacing port="sink_through 1" spacing="0"/></process><process expanded="true"><operator activated="true" class="apply_model" compatibility="8.1.003" expanded="true" height="82" name="apply on train" width="90" x="45" y="34"><list key="application_parameters"/></operator><operator activated="true" class="operator_toolbox:performance_auprc" compatibility="1.0.000" expanded="true" height="82" name="perf train" width="90" x="246" y="34"><parameter key="main_criterion" value="AUPRC"/><parameter key="AUC" value="true"/><parameter key="AUPRC" value="true"/></operator><connect from_port="model" to_op="apply on train" to_port="model"/><connect from_port="test set" to_op="apply on train" to_port="unlabelled data"/><connect from_op="apply on train" from_port="labelled data" to_op="perf train" to_port="labelled data"/><connect from_op="perf train" from_port="performance" to_port="performance 1"/><connect from_op="perf train" from_port="example set" to_port="test set results"/><portSpacing port="source_model" spacing="0"/><portSpacing port="source_test set" spacing="0"/><portSpacing port="source_through 1" spacing="0"/><portSpacing port="sink_test set results" spacing="0"/><portSpacing port="sink_performance 1" spacing="0"/><portSpacing port="sink_performance 2" spacing="0"/></process></operator><operator activated="true" class="apply_model" compatibility="8.1.003" expanded="true" height="82" name="apply on test" width="90" x="581" y="136"><list key="application_parameters"/></operator><operator activated="false" class="select_recall" compatibility="8.1.003" expanded="true" height="82" name="Select Recall" width="90" x="581" y="289"><parameter key="min_recall" value="0.8"/><parameter key="positive_label" value="1"/></operator><operator activated="false" class="apply_threshold" compatibility="8.1.003" expanded="true" height="82" name="Apply Threshold" width="90" x="715" y="289"/><operator activated="true" class="performance" compatibility="8.1.003" expanded="true" height="82" name="perf test" width="90" x="715" y="136"/><operator activated="true" class="operator_toolbox:performance_auprc" compatibility="1.0.000" expanded="true" height="82" name="perf test (2)" width="90" x="849" y="136"><parameter key="main_criterion" value="AUPRC"/><parameter key="accuracy" value="false"/><parameter key="AUPRC" value="true"/></operator><connect from_op="Retrieve creditcard" from_port="output" to_op="equalize classes" to_port="example set input"/><connect from_op="equalize classes" from_port="example set output" to_op="Split Data" to_port="example set"/><connect from_op="Split Data" from_port="partition 1" to_op="Validation" to_port="example set"/><connect from_op="Split Data" from_port="partition 2" to_op="apply on test" to_port="unlabelled data"/><connect from_op="Validation" from_port="model" to_op="apply on test" to_port="model"/><connect from_op="Validation" from_port="performance 1" to_port="result 1"/><connect from_op="apply on test" from_port="labelled data" to_op="perf test" to_port="labelled data"/><connect from_op="Select Recall" from_port="example set" to_op="Apply Threshold" to_port="example set"/><connect from_op="Select Recall" from_port="threshold" to_op="Apply Threshold" to_port="threshold"/><connect from_op="perf test" from_port="performance" to_op="perf test (2)" to_port="performance"/><connect from_op="perf test" from_port="example set" to_op="perf test (2)" to_port="labelled data"/><connect from_op="perf test (2)" from_port="performance" to_port="result 2"/><connect from_op="perf test (2)" from_port="example set" to_port="result 3"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="sink_result 1" spacing="0"/><portSpacing port="sink_result 2" spacing="0"/><portSpacing port="sink_result 3" spacing="0"/><portSpacing port="sink_result 4" spacing="0"/></process></operator></process>

 

 


Words/String Matching Producing true or false

$
0
0

I have a data set for example:

 

Internal Experience Functional Area
Marketing & Sales
Marketing & Sales
Controlling/Accounting
Marketing & Sales|Marketing & Sales
General Management
Marketing & Sales
Logistics|Logistics|Logistics|Logistics
Logistics
Marketing & Sales

I want to match it with my requirement xlsx file which contain column:

Match words
sales

 

This matching is string and is not case sensitive meaning even if it is small letters and capital it should work.After matching it should give me result as true or false or 1 or 0. Result should look like this.

Internal Experience Functional AreaMatching result
Marketing & SalesTRUE
Marketing & SalesTRUE
Controlling/AccountingFALSE
Marketing & Sales|Marketing & SalesTRUE
General ManagementFALSE
Marketing & SalesTRUE
Logistics|Logistics|Logistics|LogisticsFALSE
LogisticsFALSE
Marketing & SalesTRUE


I dont know how it can be done. please help

Problem with hierarchical clustering

$
0
0

hello.I used the prossecc document from data and tf-idf
  I used the top down clustering and agglomerative clustering operator
How do I optimize the number of clusters?
And how do I evaluate them?
Can I use performance distance clustering?
Please, tutors
Thankful

How to Filter Examples matching two attribute with the same value?

$
0
0

Got 4 attributes in one table and need to filter examples when attribute A's value equal to attribute B's value.  I don't know which operator would work. I try to use the Filter Examples in attribute value, just don't know which regular expression would work( 'A = B' didn't work...it works as A's value = B)

 

Anybody know that?  Thx!

Is RapidMiner suitable for my project ?

Text Mining Words not produced

$
0
0

So I have to find the number of specific words occurences in the excel file. For example I have ten columns in which it have in each row we have one employee. Each employee profile may contain one word which we want to find with number of occurences.

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="8.1.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="187">
<parameter key="excel_file" value="C:\Users\shahida1\Desktop\Sample Testing Text Mining.xlsx"/>
<parameter key="imported_cell_range" value="A1:F36"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="User Sys ID.true.integer.id"/>
<parameter key="1" value="Employee Id.true.integer.attribute"/>
<parameter key="2" value="Last name.true.polynominal.attribute"/>
<parameter key="3" value="First name.true.polynominal.attribute"/>
<parameter key="4" value="Business Unit.true.polynominal.attribute"/>
<parameter key="5" value="Functional Area.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="text:data_to_documents" compatibility="8.1.000" expanded="true" height="68" name="Data to Documents" width="90" x="246" y="187">
<parameter key="select_attributes_and_weights" value="true"/>
<list key="specify_weights"/>
</operator>
<operator activated="true" class="text:process_documents" compatibility="8.1.000" expanded="true" height="103" name="Process Documents" width="90" x="447" y="187">
<parameter key="vector_creation" value="Binary Term Occurrences"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
<operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="179" y="238"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="313" y="238"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="read_excel" compatibility="8.1.000" expanded="true" height="68" name="Read Excel (2)" width="90" x="112" y="442">
<parameter key="excel_file" value="C:\Users\shahida1\Desktop\Sample Testing Text Mining.xlsx"/>
<parameter key="sheet_number" value="2"/>
<parameter key="imported_cell_range" value="A1:A2"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Matching Text.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="442">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Matching Text"/>
</operator>
<operator activated="true" class="operator_toolbox:filter_tokens_using_exampleset" compatibility="1.0.000" expanded="true" height="82" name="Filter Tokens Using ExampleSet" width="90" x="581" y="340"/>
<connect from_op="Read Excel" from_port="output" to_op="Data to Documents" to_port="example set"/>
<connect from_op="Data to Documents" from_port="documents" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="example set" to_op="Filter Tokens Using ExampleSet" to_port="example set"/>
<connect from_op="Read Excel (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Filter Tokens Using ExampleSet" to_port="document"/>
<connect from_op="Filter Tokens Using ExampleSet" from_port="document" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

 
This is the file which I am working on.
https://drive.google.com/open?id=1PT38TyBmoIHDfIVCAk7Je3l3AXRP-4Gi

Viewing all 2122 articles
Browse latest View live